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ABSTRACT 


Techniques  are  presented  for  experimentally  computing 
discrete-time  model  equations  from  a  finite  set  of  sampled 
observations  of  the  system  inputs  and  outputs.   Existing 
modeling  techniques  typically  consider  simple  model  forms, 
and  often  make  limiting  assumptions  and  simplifications  for 
mathematical  convenience.   This  research  extends  these 
techniques  to  efficiently  obtain  a  more  accurate  model 
equation.   Four  key  points  are  examined:  (1)  form  of  the 
model  equation,  (2)  choice  of  the  error  minimization 
technique,  (3)  efficiency  of  model  determination  and 
evaluation  algorithms,  and  (4)  interpretation  of  the 
obtained  model  equations  in  typical  applications. 

A  new  algorithm  for  efficient  model  determination,  the 
Search  Indicator  Growth  Algorithm,  is  presented.   This 
iterative  algorithm  efficiently  evaluates  a  set  of  model 
terras  and  eliminates  the  undesired  terms.   The  technique 
produces  more  accurate  and  robust  model  equations,  and 
offers  significant  computational  advantages  over  existing 
techniques.   Computer  simulated  experiments  illustrate  the 
effectiveness  of  this  method. 
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I.   INTRODUCTION 

A.   PRESENTATION  OF  THE  RESEARCH  PROBLEM 

This  research  examines  the  problem  of  experimentally 
developing  discrete-time  model  equations  to  represent,  or 
approximate,  the  input-output  behavior  of  both  linear  and 
nonlinear  systems  based  on  a  finite  set  of  sampled 
observations  of  the  system  inputs  and  outputs. 

The  traditional  approach  to  the  modeling  problem 
involves  selecting  a  particular  model  form,  estimating  the 
unknown  coefficients  of  the  model  from  the  observations,  and 
finally  verifying  the  quality  of  the  model.   The  particular 
model  form  is  commonly  chosen  for  mathematical  convenience 
or  from  some  physical  understanding  of  the  structure  of  the 
system.   Given  a  specific  model  equation  to  represent  a 
system,  a  number  of  techniques  exist  for  estimating  the 
values  of  the  model  coefficients  that  minimize  a  function  of 
the  fitting  error  between  the  model  and  the  system. 

We  are  interested  in  developing  the  techniques  needed  to 
obtain  a  suitable  model  when  the  underlying  structure  of  the 
system  is  unknown .   With  a  suitable  model,  a  variety  of 
current  applications  can  be  approached,  including  the 
detection  and  evaluation  of  failures  that  affect  system 
performance.   The  problem  is  how  to  obtain  a  useful  model 
from  the  available  observations  of  the  system. 
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There  are  four  key  parts  of  this  modeling  problem. 
First,  the  allowed  functional  forms  for  the  model  equation 
must  be  sufficiently  general  to  permit  adequate 
approximation  of  the  behavior  of  systems  of  interest  without 
requiring  an  unmanageable  number  of  terms.   (The  adequacy  of 
an  approximation  is  an  application  dependent  consideration 
and  will  be  discussed  later.)   For  most  nonlinear  systems  of 
interest,  the  discrete-time  model  form  typically  used  in  the 
literature  is  the  Volterra  series  model.   This  form  does  not 
adequately  satisfy  the  accuracy  or  compact  representation 
requirements  except  for  a  very  restricted  class  of  systems. 

The  second  key  part  is  the  determination  of  the  best 
error  minimization  technique  for  use  in  evaluating  any  model 
equation.   This  becomes  an  area  of  concern  in  terms  of  both 
accuracy  and  computational  efficiency  when  we  are  faced  with 
finite  length  data  sequences  and  measurement  noise. 

The  third  key  part  is  the  development  of  a  general 
technique  for  evolving  or  "growing"  a  model  equation  in  a 
computationally  efficient  and  accurate  manner.   Existing 
techniques  for  approaching  this  part  of  the  problem  are  very 
limited  as  to  the  functional  model  form  they  can  handle,  and 
often  make  somewhat  artificial  assumptions  and 
approximations  to  obtain  even  a  partial  solution.   The 
result  is  typically  an  inferior  model  of  the  system,  with 
insufficient  prediction  accuracy  or  an  excessive  number  of 
terms  in  the  model  equation. 
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The  fourth  part  is  determining  if  the  obtained  model  is 
the  "best"  model,  in  some  sense,  for  use  in  a  particular 
application.   This  last  point  involves  an  investigation  of 
whether  any  obtained  model  equation  is  a  preferred 
representation  of  the  system,  or  just  one  model  from  a  set 
of  functionally  equivalent  models.   We  examine  each  of  these 
four  areas  in  this  research  and.  extend  the  existing 
techniques  for  developing  model  equations. 

Most  researchers  have  approached  modeling  from  the 
coefficient  estimation  perspective,  and  there  is  a  large 
body  of  literature  on  techniques  for  efficiently  estimating 
the  values  of  the  coefficients  of  specific  models  once  the 
model  form  has  been  chosen.   Levinson  [Ref.  1],  Durbin 
[Ref.  2],  Robinson  [Ref.  3],  and  Morf  [Ref.  U  and  5]  have 
developed  computationally  efficient  "recursive-in-order" 
algorithms  for  iteratively  estimating  the  coefficient  values 
of  certain  linear  models.   Recently,  Lee  [Ref.  6], 
Friedlander  [Ref.  ?],  Perry  [Ref,  8],  and  Parker  and  Perry 
[Ref.  9]  have  reported  on  extensions  of  these  algorithms 
that  estimate  the  coefficient  values  of  a  wider  class  of 
model  forms.   These  techniques  are  shown  to  be  inadequate 
for  the  more  general  problem  of  an  unknown  system  form. 

We  approach  the  modeling  problem  from  a  different 
perspective,  that  of  systematically  growing  a  model  that 
minimizes  the  error  residual  signal  (performance  modeling), 
rather  than  just  estimating  the  coefficients  of  an  arbitrary 
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model  form.   We  have  shifted  our  concern  to  performance 
modeling  since  we  are  really  interested  in  the  behavior  of 
the  system,  and  not  the  values  of  the  coefficients  of  one 
particular  approximation  of  it.   This  may  seem  at  first  to 
be  a  relatively  minor  difference  in  approach,  but  the 
development  in  the  following  chapters  has  uncovered  some  new 
capabilities  for  more  efficient  model  determination. 


B.   DISCUSSION 

Modern  systems  are  both  dynamic  and  complex,  yet  they 
generally  work  by  cause  and  effect,  i.e.  the  set  of  inputs 
operating  on  the  system  produces  a  set  of  outputs. 
Knowledge  of  this  relationship  between  input  and  output, 
enables  us  to  approach  various  practical  applications.   We 
can  predict  reaction  based  on  the  action  applied,  control 
the  output  by  modification  of  the  input,  adapt  the 
behavioral  characteristics  so  that  a  given  stimulus  will 
result  in  a  desired  response,  diagnose  the  cause  by 
observing  the  effect,  and  detect  and  evaluate  changes 
(failures)  in  a  system's  performance. 

For  failure  detection  applications,  it  is  necessary  to 
have  some  standard  or  reference  by  which  to  make  the 
determination  that  a  failure  has  occurred.   An  often  used 
concept  provides  redundancy  in  the  form  of  one  or  more 
additional  systems  (or  subsystems)  operating  in  parallel 
with  the  original  system,  compares  the  various  outputs,  and 
uses  an  appropriate  criterion  to  detect  a  failed  system. 
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It  is  not  feasible  to  have  redundant  systems  in  many 
applications  so  a  simulation  is  used,  such  as  a  mathematical 
model  that  approximates  the  system's  performance 
characteristics.   In  some  cases,  this  model  can  be  designed 
from  a  detailed  knowledge  of  both  the  structure  and 
components  of  the  system  [Ref.  10].   Because  extensive 
detail  is  typically  required  for  creating  this  type  of 
model,  we  refer  to  this  as  "microscopic  modeling".   But 
there  is  often  insufficient  knowledge  of  the  system 
structure  and/or  component  behavior  of  real  world  systems, 
or  the  computational  complexity  is  excessive,  and  an 
alternate  modeling  technique  is  needed. 

One  concept  used  by  earlier  researchers  [Ref.  11  and  12] 
selects  a  specific  model  form  and  uses  it  to  characterize, 
or  approximate,  the  performance  of  the  system  from  input  and 
output  measurements  of  the  system.   This  concept  is  referred 
to  as  "input-output"  or  "macroscopic"  modeling,  and  the 
approximation  must  be  done  in  some  meaningful  sense  if  it  is 
to  be  useful.   The  choice  of  this  mathematical  form 
determines  both  the  quality  of  the  model  (extent  to  which  it 
approximates  the  system  behavior)  and  the  meaning  of  the 
model  coefficient  estimates.   If  the  mathematical  model  is 
an  exact  or  equivalent  representation  of  the  system,  there 
is  a  correspondence  between  the  set  of  system  parameters  and 
the  set  of  model  coefficients.   Various  properties  of  the 
model  and  the  error  residual  (difference  between  the  system 
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and  realized  model  outputs  under  Identical  input  conditions) 
can  be  used  in  applications  including  failure  detection  and 
evaluation.   The  characteristics  of  the  measured  input 
signal,  and  of  any  measurement  noise,  has  a  direct  effect  on 
the  model  performance  . 

Macroscopic  characterization  also  involves  model 
building  (growing),  which  adds  terms  to  a  given  model 
equation  to  better  fit  the  observed  data.   If  a  model  fit  is 
not  adequate  for  an  application,  then  the  standard  technique 
in  the  literature  [Ref.  11]  guesses  at  a  "better"  model  and 
fits  to  it  the  same  data.   This  "brute-force"  technique 
makes  little  use  of  the  specific  features  of  the 
unacceptable  model,  and  blindly  continues  until  (hopefully) 
an  adequate  fit  is  obtained. 

This  technique  makes  some  physical  and  practical  sense 
when  dealing  with  a  simple  model  form  corresponding  to  the 
known  form  of  the  system.   Examples  include  linear  transfer 
functions  (ARMA  models)  and  static  polynomials.   In  these 
cases,  each  successively  larger  model  provides  a  better  fit 
and  the  preceding  fit  can  be  considered  as  a  reduced  order 


1   Two  interesting  cases  regarding  the  input  signal  are; 
(1)  we  have  little  or  no  control  over  the  characteristics  of 
the  input  signal,  and  (2)  the  input  signal  (or  probe)  is 
under  our  complete  control  for  the  system  characterization 
process.   For  the  work  that  follows,  we  consider  case  (2) 
with  the  assumption  that  the  input  measurements  are 
representative  of  the  normal  system  operation.   Section  E  of 
Chapter  VI  examines  both  of  these  situations.   The  impact  of 
measurement  noise  is  also  addressed  in  Chapter  III. 
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raodel  of  the  system.   This  technique  is  of  dubious  value 
when  the  form  of  the  system  is  unknown.   We  need  to  develop 
alternative  growth  techniques  that  are  more  useful  in  the 
general  case. 

A  particular  recursive  algorithm  was  introduced  by 
Levinson  [Ref.  1],  and  adapted  by  Durbin  [Ref.  2],  to 
efficiently  obtain  the  solution  of  a  specially  structured 
set  of  linear  equations.   In  using  this  algorithm,  a  crucial 
simplification  was  often  made  by  earlier  researchers  [Ref. 
4-9,  13  and  14].   By  limiting  the  form  of  the  model  and 
assuming  that  the  input  sampled  data  sequence  is  ergodic,  a 
special  mathematical  structure  can  be  induced  into  the  model 
evaluation  equations,  and  exploited  to  save  a  significant 
amount  of  mathematical  computation.   This  ergodic  assumption 
has  been  rationalized  from  different  points  of  view,  and  has 
resulted  in  various  related  evaluation  techniques.   The 
simplification  will  be  examined  in  detail  in  Chapter  III, 
since  its  true  effects  do  not  appear  clearly  in  the 
literature.   While  the  simplification  appears  reasonable  in 
the  isolated  context  in  which  it  was  made,  it  will  be  shown 
that  the  net  effect  of  model  evaluation  using  this 
simplification  is  typically  that  the  obtained  raodel  is 
suboptimal  in  prediction  performance. 

Even  without  these  assumptions,  the  recursive  solution 
algorithm  has  limited  usefulness.   While  feasible  for 
typical  linear  raodel  forms,  it  is  shown  that  the  recursive 
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algorithm  rapidly  becomes  computationally  prohibitive  when 
considering  general  nonlinear  models.   Alternative  model 
growth  techniques  are  therefore  needed. 

It  is  often  overlooked  that  from  the  same  input 
sequence,  different  system  equations  can  give  the  same 
output  sequence.   The  result  is  indistinguishable 
performance  characteristics  from  structurally  different 
systems,  or  equivalen tly  ,  multiple  different  (and  often 
independent)  model  equations  each  adequately  describing  the 
performance  characteristics  of  a  single  system. 

This  point  will  be  discussed  further  because  it  has 
implications  for  the  fault  detection  and  evaluation 
application.   Using  the  integer  n  as  the  discrete  time 
index,  the  system  input  sequence  is  denoted  as  {u(n)},  the 
output  sequence  as  {y(n)}.  and  the  residual  sequence  due  to 
inaccuracies  in  the  model  equation  as  {e(n)}.   Assume  a 
suitable  model  has  been  obtained  and  is  used  in  conjunction 
with  the  real  system  as  shown  below. 


Input  sequence  {u(n)} 


SYSTEM 


MODEL 


Output  Sequence  {y(n)} 


Residual  Sequence  {e(n)} 


Figure  1:  Configuration  for  Fault  Detection 
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A  significant  change  in  the  characteristics  of  the  error 
residual  sequence  {e(n)}  could  be  used  to  indicate  that 
there  has  been  a  change  in  the  behavior  of  the  system  (fault 
detection).   There  are  applications  where  we  would  like  to 
uniquely  determine  the  change  in  the  system  (fault 
evaluation).   This  last  capability  requires  that  there  exist 
a  unique  one-to-one  relationship  between  the  value  of  the 
system  parameter  that  changed,  and  the  resulting  value  of  a 
model  coefficient.   This  is  obviously  not  possible  when 
there  are  two  or  more  structurally  different  but 
equivalently  performing  model  equations. 

The  existence  of  a  model  equation  that  exactly  describes 
a  finite  set  of  input-output  measurements  does  not  imply  any 
uniqueness  properties  [Ref.  151.   This  can  be  demonstrated 
by  considering  particular  examples  of  structurally  different 
but  equivalently  performing  models  for  a  given  input. 

Example  1.1:   The  time  series  of  measurements 

{u(n)  ;n  =  0. 1  ,2.3 7 }  =  {1,1.3.7.17,41,99,239....}  and 

{y(n) ;n  =  0, 1  ,2 . 3 7....}  =  {0,1,2.5,12,29,70.169,...}  are 

equivalently  described  over  any  interval  with  n>1  by  both  of 
the  linearly  independent  model  equations; 


y(n)  =  u(n)  -  y(n-1) 


and 


y  (  n)  =  u(  n-1  )  ■»-  y(  n-1  ) 


{1.1} 


{1  .2} 
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Example  1.2:   The  time  series  of  measurements 

{u(n)  ;n  =  0.  1  ,2 7,...}  =  {1.1.-1,2,-5,10,-22.47,...}  and 

{y(n) ;n  =  0, 1  .2 7,...}  =  {0,1,0,2,-3,7,-15,32,...}  are 

equivalently  described  over  any  interval  with  n>2  by  both  of 
the  linearly  independent  model  equations; 

y(n)  =  u(n)  +  y(n-1 )  {1.3} 

and 

y(n)  =  u(n-2)  -  y(n-1)  +  y(n-2)  {1.4} 

Example  1.3:  The  equation 
y(n)  =  .9u(n)  -  . 5u ( n- 1 ) y ( n- 1)  {1.5} 

and  the  equation 

y(n)  =  .9u(n)  -  . 45u ( n- 1  ) u ( n- 1 )  +  . 25u ( n- 1) u ( n-2 ) y ( n-2 )   {1.6} 
can  be  realized  as  shown  in  Figures  2  and  3  respectively. 


Figure  2:  A  block' di agram  realization  of  Equation  {1.5} 
u(n) 


Figure  3:  A  block  diagram  realization  of  Equation  {1.6} 
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These  two  different  equations  will  produce  the  identical 
output  sequence  { y ( n )  ;  S<  =  n<  =  T}  for  any  given  input  sequence 
{u(n) ;S<=n<=T} .   Note  that  (1.5)  and  {1.6}  are  not 
independent  since  {1.6}  can  be  obtained  from  {1.5}  directly 
by  the  following  recursion. 
Start  by  replacing  n  by  n-1  in  {1.5}. 

y(n-1)  =  .9u(n-1)  -  .  5u ( n-2 ) y ( n-2 )  {1.7} 

Substituting  {1.7}  into  the  right  side  of  {1.5}  yields: 
y(n)  =  .9u(n)  -  .5u(n-1)  [.9u(n-1)  -  . 5u ( n-2 ) y ( n-2 ) ] 

=  .9u(n)  -  .45u(n-1 )u(n-1 )  +  . 25u ( n- 1  ) u ( n-2 ) y ( n-2 )   {1.8} 
Since  {1.8}  is  the  same  as  {1.6},  therefore  {1.5}  and  {1.6} 
are  not  independent.   There  could  be  the  case  where  {1.5} 
was  the  system  and  {1.6}  was  the  model,  or  vice-versa.   A 
change  in  one  system  parameter  would  not  be  uniquely  related 
to  a  resulting  change  in  one  model  coefficient.   This  also 
explains  why  an  experimentally  obtained  model  may  explain 
predictability,  but  not  cause  and  effect. 

Measurement  data  of  a  physical  system  can  be  matched  by 
more  than  one  model  equation  for  a  number  of  reasons.   It 
may  be  by  coincidence;  the  input  may  keep  the  output  of  each 
model  exactly  the  same.   It  was  recently  shown  [Ref.  16] 
that  if  the  system  is  linear,  the  existence  of  linearly 
independent  models  that  match  the  input-output  data  can  be 
attributed  to  the  particular  inputs  that  generate  the 
measurements  from  which  these  independently  equivalent 
models  are  computed. 
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The  preceding  discussion  shows  that  the  following 
problems  must  be  considered.   If  we  obtain  a  set  of  models 
each  acceptably  describing  the  input-output  performance  of  a 
system,  can  we  determine  i_f  any  of  these  models  include  the 
structural  properties  of  the  actual  system  that  produced  the 
data?   Naturally,  we  should  expect  that  we  will  rarely  be 
able  to  obtain  a  model  with  the  same  detailed  structure  as 
the  system.   The  second  problem  is  how  to  determine  the 
"best"  model  for  a  particular  application  from  those 
candidate  models  in  the  equivalence  set.   Chapter  VIII  will 
address  these  problems  and  present  some  new  results  in  the 
context  of  a  particular  application. 

C.   OVERVIEW 

Chapter  II  provides  a  review  of  existing  general  linear 
and  nonlinear  model  forms  and  presents  an  extension  to  a 
unifying  general  model  that  we  will  use.   Chapter  III 
presents  various  error  minimization  techniques  for 
evaluating  candidate  model  equations  and  proves  the 
generally  superior  modeling  performance  of  one  particular 
technique  known  as  the  "Covariance"  least  squares  method, 
over  the  least  squares  technique  commonly  used  in  the 
literature.   Additive  measurement  noise  is  examined,  and  new 
expressions  are  developed  for  the  resulting  distortion  of 
the  fitting  error  and  of  the  coefficient  estimates  of  linear 
recursive  models. 
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Chapter  IV  discusses  the  equations  required  to  evaluate 
the  performance  of  a  model  as  more  terms  are  included,  and 
presents  a  recursive  technique  for  evaluating  the  solution 
of  a  large  and  useful  class  of  equations.   The  computational 
advantages  of  this  technique  are  compared  with  the  direct 
least  squares  evaluation  of  each  model. 

Chapter  V  discusses  the  existing  model  growth  techniques 
based  on  the  parameter  estimation  approach.   It  shows  that 
the  recursive  technique  of  Chapter  IV  reduces  to  the 
commonly  used  parameter  estimation  technique  known  as 
Levinson's  Algorithm  when  two  restrictive  assumptions  are 
made.   The  use  of  these  assumptions  typically  produced 
inferior  models  compared  to  those  produced  by  the  more 
general  technique.   Chapter  V  also  discusses  several 
possible ♦nonl inear  model  growth  algorithms  that  are  logical 
extensions  of  existing  linear  growth  techniques. 

Chapter  VI  presents  a  new  concept  in  iterative  model 
growth  based  on  the  developments  in  the  preceding  chapters, 
and  analyzes  the  advantages  and  limitations.   This  heuristic 
technique  is  shown  to  offer  significant  improvements  over 
the  existing  and  previously  discussed  methods.   Chapter  VII 
gives  the  results  of  computer  simulations  and  real  world 
experimental  comparisons  of  the  model  growth  techniques 
developed  in  this  research. 

Chapter  VIII  examines  three  additional  applications  for 
the  modeling  methods  developed  in  this  thesis.   They  are 
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fault  detection,  fault  evaluation,  and  reduced  order 
modeling.   Specific  techniques  are  discussed  and  a  number  of 
concepts  are  proposed.   Conclusions  are  given  on  the  key 
results  of  this  work  and  areas  for  future  research  are 
outlined . 
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II.   CHOICE  OF  THE  MODEL  EQUATION  FORM 

A.   EXISTING  MODEL  FORMS 

We  are  concerned  with  the  determination  of  discrete-time 
models  for  both  linear  and  nonlinear  systems.   The  logical 
starting  point  is  a  discussion  of  existing  linear  model 
forms.   We  limit  discussion  to  single-input,  single-output 
systems  for  simplicity,  but  using  a  vector  notation,  the 
results  can  be  directly  extended  to  the  multiple-input, 
multiple-output  case.   We  also  limit  consideration  to  models 
whose  input-output  relationships  can  be  described  by  time- 
invariant  difference  equations  . 

After  a  discussion  of  linear  models,  the  few  general 
dynamic  nonlinear  models  forms  found  in  the  literature  are 
presented.   A  more  general  nonlinear  model  form  that 
subsumes  the  preceding  linear  and  nonlinear  models  is  then 
discussed.   One  particular  version  of  this  general  form  is 
then  developed  in  greater  detail,  and  utilized  in  the 
remainder  of  this  work. 


2   It  is  recognized  that  there  are  systems  where  an  input- 
output  relationship  cannot  be  exactly  described  by  a 
difference  equation.   Consider  a  discrete  quantizer  whose 
output  y(n)  at  any  instant  n  is  equal  to  the  integral  part 
of  the  input  u(n)  if  the  fractional  part  of  u(n)  is  less 
than  0.5,  and  is  equal  to  1  plus  the  integral  part  of  u(n) 
if  the  fractional  part  of  u(n)  is  greater  than  or  equal  to 
0,5.   Clearly  the  input -output  relationship  of  such  a  system 
cannot  be  accurately  described  by  a  difference  equation. 
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Linear  dynamic  discrete  models  include  the  moving 
average  (MA),  autoregressi ve  (AR),  and  autoregr essi ve-raoving 
average  (ARMA)  forms.   They  are  well  described  in  the 
literature  [Ref.  11,  17,  and  18]  and  are  briefly  discussed 
here  for  completeness. 

The  MA  model  predicts  the  current  value  of  the  output  of 
a  system  as  a  weighted  summation  of  the  current  and  q 
consecutive  preceding  values  of  the  system  input,  where  q  is 
known  as  the  order  (or  memory)  of  the  MA  model.   Following 
the  previously  used  convention,  the  sampled  observations  of 
the  system  input  are  denoted  as  { u ( n )  ;  S<  =  n<  =  T} ,  the  system 
output  as  { y ( n )  ;  S<  =  n<  =  T}  ,  and  the  residual  error  due  to 
inaccuracies  in  the  model  as  { e ( n )  ;  S<  =  n<  =  T } .   The  model 
equation  can  be  written  as; 


(q) 

y(n)  =  a   (0)  u(n) 


(q)  (q) 

a   (1)  u(n-1)  +,..+  a   (q)  u(n-q)  +  e(n) 


q 


(q) 

a   (i)  u(n-i)  +  e(n) 


{2.1} 


(q) 

The  coefficients  are  the  a   (i)  factors  that  multiply  each 
corresponding  (i)    delayed  input  term.   The  (q)  superscript 
is  used  to  emphasize  the  dependency  of  each  coefficient 
value  on  the  order  of  the  model,  and  the  superscript 
notation  is  dropped  when  no  ambiguity  results. 

These  models  are  called  moving  average  because  the 
current  output  is  a  weighted  average  of  a  finite  "window" 
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passing  over  the  present  and  past  input  values.   Models  of 
the  form  of  Eq.  {2.1}  are  denoted  as  MA(q). 

The  p^   order  AR  model  predicts  the  current  value  of  the 
system  output  as  a  weighted  summation  of  p  consecutive 
preceding  output  values.   Using  similar  notation,  the 
equation  for  this  model  is  written  as; 

(P)  (P)  (P) 

y(n)  =  b   (1)  y(n-1)  +  b   (2)  y(n-2)  +...+  b   (p)  y(n-p)  +  e(n) 


i 


(P) 

b   ( j )  y(n-j )  +  e(n) 


(P) 


{2.2} 


The  coefficients  are  the  b   (j)  factors  that  multiply  each 

th 
corresponding  (j)    delayed  output  terra.   Models  of  the  form 

of  Eq.  {2.2}  are  denoted  as  AR(p). 

Despite  their  simple  form,  MA(q)  and  AR(p)  modeling  of 

even  simple  linear  systems  often  requires  an  excessively 

large  number  of  model  terms  (a  high  order  model).   A  natural 

extension  of  these  two  models  is  a  combination  of  both. 

Such  mixed  models  are  called  autoregressi ve-moving  average, 

or  ARMA,  models  of  orders  p  and  q,  and  are  often  written  as 

ARMA(p,q).   The  ARMA  model  predicts  the  current  value  of  the 

system  output  as  a  weighted  summation  of  the  current  value 

of  the  system  input,  q  consecutive  preceding  values  of  the 

system  input,  and  p  consecutive  preceding  values  of  the 

system  output.   Combining  the  previous  notations  produces 

the  model  equation; 
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y(n)  =0^   (0)  u(n)  +a   (1)  u(n-1)  +...  +  0^   (q)u(n-q) 


,(P) 


,(P) 


(P) 


+P   (1)  y(n-l)  +p   (2)  y(n-2)  +...+/S   (P)  y(n-p)  +  e(n) 


Q     (q)  P  ^(p) 

=   y  Oi        (i)  Lj(n-i)  +  y  j3      (j)  y(n-j)  +  e(n)    {2.3} 

(q)  O^^^ 

The  coefficients  are  the  0^   (i)  and  fj       (j)  factors  that 

multiply  each  corresponding  (i)    delayed  input  term  and 
(j)    delayed  output  term,  respectively.   Real  world  linear 
systems  typically  include  feedback,  and  can  be  adequately 
modeled  with  the  smallest  number  of  terms  by  an  ARMA  model. 

The  literature  discusses  two  general  dynamic  discrete- 
time  nonlinear  models.   The  Voltera  mkdel  [Ref.  19,  20  and 
21]  can  be  thought  of  as  a  nonlinear  generalization  of  the 
MA  model.   This  model  predicts  the  current  value  of  the 
system  output  as  a  linearly  weighted  summation  of  increasing 
degree  products  of  the  current  and  m    consecutive  preceding 
input  values.   Using  the  typical  notation  followed  in  the 
literature  as  a  guide,  the  equation  for  this  model  form  is 
written  as; 


y(n)  =   ^    f  [u(n-i)  ;i  =  0  ,  1  ,2 m]  +  e(n) 


{2.4} 


kri 


where 


=  1     °2  ^1 


m  K 

=  C  ^  '^    1=  1 


k       k-1 


{2.5} 


th 


and  a  (g  ,g  ,...,g  )  is  called  the  k   degree  Volterra  kernel 
ic°l°2       k 
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When  the  degree  d  equals  1,  Eq.  {2.4}  reduces  to  the  form  of 
the  MA  model  . 

The  Volterra  model  is  based  only  on  a  sum  of  products  of 
past  and  present  input  values.   Because  of  this  nonrecursive 
form,  the  Volterra  model  is  unable  to  compactly  represent  a 
system  that  includes  significant  feedback.   A  model  of  the 
form  of  Eq.  {2.4}  is  denoted  as  VOL(d,m).   The  lower  limits 
of  the  summations  in  Eq.  {2.5}  were  purposely  chosen  to 
eliminate  redundant  terms,  and  therefore  we  have  minimized 
the  number  of  equations  that  need  to  be  solved  in  the 
evaluation  of  any  particular  VOL(d,m).   The  upper  summation 
limits  of  Eq.  {2.5}  are  all  set  equal  to  the  integer  m  for 
notational  clarity  at  this  point.   We  could,  of  course,  use 
a  more  general  notational  convention  for  the  upper  summation 
limits  (e.g.  ra  ;  i=1,2,.^.).   A  more  general  upper  summation 
limit  notation  would  produce  more  complexity  in  the 
equations,  and  offers  no  specific  advantages  for  the  problem 
examined  in  this  thesis.   The  Volterra  model  of  a  system  may 
not  require  all  of  these  terms  indicated  by  Eq  .  {2.5}. 

The  Bilinear  model  [Ref.  22,  23,  24  and  25]  predicts  the 
current  output  of  the  system  as  a  linearly  weighted 
summation  of  the  current  and  m  consecutive  preceding  input 
values,  plus  a  linearly  weighted  term  composed  of  the 
product  of  one  of  m  preceding  output  values  with  the  current 
or  one  of  m    preceding  input  values.   Using  the  typical 
notation  found  in  the  literature,  the  equation  for  this 
model  form  is  written  as; 
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y(n)  =   y      a^  \i)    u(n-1) 


rn    m 


(m) 


^  c   (i,j)  u(n-i)y(n-j)  +  e(n) 
i  =  0  j  =  1 


{2.6} 


This  form  includes  bilinear  terms  composed  of  the 
products  of  specific  input  and  output  factors,  a  feature  not 
found  in  the  previously  discussed  model  forms.   However,  it 
is  limited  to  the  one  type  of  nonlinear  form  shown  above. 
Models  of  the  form  of  Eq .  {2.6}  are  denoted  as  BIL(m).  .  The 
restriction  to  equal  upper  summation  limits  is  again  used 
for  clarity.   The  Bilinear  model  of  a  system  may  not  require 
all  of  these  terras  indicated  by  Eq.  {2.6}. 

B.   DEVELOPMENT  OF  A  MORE  GENERAL  MODEL  FORM 

When  used  for  the  modeling  of  a  typical  nonlinear 
system,  the  Volterra  model  form  suffers  from  the  same 
limitations  as  the  MA  model  form  does  for  linear  systems. 
The  existence  of  any  feedback  in  a  system  will  usually 
result  in  the  requirement  for  the  order  m  to  be  very  large 
in  Eq.  {2.1}  in-  the  linear  case,  or  in  Eq.  {2.U}  in  the 
nonlinear  case.   This  property  of  nonrecursive  model  forms 
results  in  the  need  for  an  unacceptably  large  number  of 
model  terms  to  adequately  represent  the  behavior  of  typical 
systems  . 
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A  natural  extension  of  the  Volterra  model  is  to  include 
Volterra-like  terras  of  the  output  of  the  system,  in  a  manner 
similar  to  the  relationship  between  the  MA  and  the  ARMA 
models.   An  investigation  of  the  effect  of  feedback  in  some 
common  nonlinear  systems  leads  to  the  conclusion  that  it  is 
also  useful  to  include  terms  that  are  extensions  of  the 
Bilinear  model  form.   Such  an  extension  has  been  made  and 
some  partial  results  concerning  different  versions  of  this 
new  model  form  have  been  published  [Ref.  9,  26  and  27], 

One  version  of  this  model  form  was  called  the  Nonlinear 
ARMA  model  in  references  9  and  26.   To  better  distinguish 
the  properties  of  a  more  general  form  of  the  model,  it  is 
denoted  as  the  Bivariate  Volterra  Model  (BVM)  in  reference 
27  and  in  the  work  that  follows. 

The  coefficient  notation  of  the  previously  discussed 
linear  and  nonlinear  models  forms  follows  the  conventions 
found  in  the  literature.   Considerable  thought  was  given  to 
the  need  for  suitable  notation  for  the  more  general  and 
complex  model  form,  and  also  for  the  developments  that 
follow.   It  was  decided  to  have  a  uniform  coefficient 
notation  that  could  be  applied  to  any  of  the  models  of  this 
chapter.   Following  is  a  compact  equation  for  BVM(d,ra),  a 
BVM  of  degree  d  and  memory  ra  in  terras  of  this  uniforra 
coefficient  notation. 
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y(n)    = 


d 

.?, 

d 
d-1    d-r 

^  I    I 

r=  1      Sri 


m  m 


ni  rn  m  s 

J  =  1 


h^=0      h^rh^  h3=h3.^ 


m  rn 


m 


m  m 


m 


I     I    •■•  E        E    E     •••  I 


r  s 

"■  "■  i=1  "■    j=1  -^ 


+    e(n) 
{2.7} 


The  coefficients  are  the  factors  starting  with  9  ^^, 

t ,  u 

0^.  ,  and  9_.e  in  EQ-  {2.7}.   Note  that  two  subscripts  and 
0 ;  s        J^ .  s      ^ 

one  or  more  parameters  in  parenthesis  are  included  for  each 

coefficient.   A  Q  ^  -  pl    coefficient  is  used  in  conjunction  with 

a  term  composed  exclusively  of  r  products  of  past  and 

present  input  factors.   Likewise,  a , 0^    coefficient  is  used 
♦  0;s 

in  conjunction  with  a  term  composed  exclusively  of  s 

products  of  past  output  values.   The  subscript  in  each  of 

these  cases  distinguishes  the  number  of  such  factors  in  the 

corresponding  model  term.   Finally,  we  use  9_.„  for  the 

r  ,  s 

coefficient  of  the  model  term  composed  of  r  input  factors 
and  s  output  factors.   In  all  cases,  the  parameters  in 
parenthesis  in  each  model  coefficient  distinguish  the 
particular  lag  factors. 

A  model  of  this  general  form  can  be  completely  described 
by  either  specifying  a  particular  degree  d  and  memory  m,  or 
by  just  providing  the  distinguishing  parameters  and 
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subscripts  of  each  desired  coefficient.   Following  is  an 
example  of  a  second  degree,  first  order  BVM  equation  which 
is  denoted  as  BVM(2,  1 ) . 


y(n) 


=  9j_.q(0)  u(n)  +  ®l-0^^^  u(n-1)  +  Q^.qCO.O)  u  ( n )' 
+  92-0^°'^^  u(n)u(n-1)  +  ^z-Q^^'^^    u(n-1)^ 


+  9;L-1^°'  ""  ^  u(n)y(n-1) 


9^.^( 1 , 1 )u(n-1 )y(n-1 ) 
+  9^.^(1)  y(n-1)^+  ®0-2^^'^^  y(n-1)^  +  e(n)       12.8} 


0;2 


This  coefficient  notation  completely  specifies  the  model 
terms,  as  demonstrates  in  the  following  examples. 


0^.^(1.2,3)  is  the  coefficient  of  term  u ( n- 1 ) u ( n-2 ) u ( n-3 ) 
0-   (0,1,1)  is  the  coefficient  of  term  u ( n ) u ( n- 1 ) y ( n- 1  ) 


The  choice  of  the  various  lower  summation  limits  in  Eq . 

{2.7}  eliminates  redundant  model  terms.   The  upper  summation 

limits  are  set  equal  to  m  for  notational  clarity,  as  was 

done  for  the  VOL  and  BIL  forms.   Because  the  upper  summation 

limits  of  Eq.  {2.7}  were  all  set  equal  to  m  in  the  preceding 

pages,  a  compact  expression  for  the  number  of  coefficients 

in  a  full  BVM  of  degree  d  and  memory  m  can  be  written  as; 

^r  3                    r        s^ 

d   ^^(m+i)  d  YT'^^"''*J^   '^"''  cl-r  ^^(m+i)LT(m-1+j) 

c(d,ra)  =  Y^  i=1  +r^  j,=J +  Y^  r^  i,=J j_=J 

L.              r\  Z-  3!         Z-  Z-     r!   s! 

r= 1  s=1  r= 1  s= 1              {2.9} 


This  equation  is  used  in  subsequent  chapters  when  the 
computational  complexity  of  evaluating  this  full  model  form 
i  s  considered  . 
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I 


The  BVM  form  defined  in  Eq.  {2.7}  is  limited  to  the 
summation  of  products  of  integer  powers  of  past  and  present 
input  terras,  and  past  output  terras.   Other  functional  forms 
besides  sums  of  integer  products  are  possible,  but  this  form 
appears  to  be  most  tractable  for  our  modeling.   Examination 
of  Eq.  {2.1}  through  Eq  .  {2.6}  confirms  that  the  BVM  form 
subsumes  the  AR ,  MA,  ARMA,  VOL,  and  BIL  model  forms. 

For  example,  an  ARMA(p,q)  is  subsumed  by  a  BVM(1,m)  when 

the  degree  d  of  the  BVM  is  set  equal  to  1,  and 

ra  =  maximum  (p,q).   This  only  allows  terras  with  descriptive 

coefficients  0,   (i)  or  0^., (j);  where  0<=i<=m  and  1<=i<=m. 
1;0         0,1  ^ 

This  includes  all  the  terms  of  an  ARMA(p,q). 

The  BVM  form  is  emphasized  because  it  includes  the  other 
general  forms  discussed  in  the  chapter  and  commonly  used  in 
the  literature.   Using  this  BVM  form  rather  than  the  ARMA, 
VOL  or  BIL  forms,  typically  produces  a  more  compact  and 
accurate  representation  of  nonlinear  systems  with  feedback. 
Chapter  VII  contains  several  computer  simulated  experiments 
demonstrating  this  point. 
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III.   EXAMINATION  OF  ERROR  MINIMIZATION  TECHNIQUES 

A.   DISCUSSION 

Assume  we  are  given  any  particular  model  form  linear  in 
some  finite  number  c  of  unknown  coefficients,  and  a  set  of 
input-output  data  of  length  N>c.   We  are  interested  in 
determining  the  particular  model  equation  that,  in  some 
meaningful  sense,  best  approximates  the  performance  of  the 
system  that  produced  the  N  output  measurements  from  the 
corresponding  N  input  measurements.   There  are  different 
error  criteria,  including  least  squares  and  minimax,  that 
could  be  used  in  minimizing  the  error  residual.   Least 
squares  techniques  minimize  the  average  squared  residual 
sequence  value  in  a  given  interval,  while  minimax  techniques 
minimize  the  maximum  absolute  value  of  the  residual  sequence 
in  the  interval. 

Difficulties  with  least  squares,  including  degraded 
modeling  performance  under  noisy  conditions,  have  been 
reported  [Ref.  13].   Nevertheless,  it  has  been  decided  to 
investigate  the  use  of  least  squares  techniques  for  the 
following  reasons;  (1)  least  squares  minimization  for  models 
linear  in  the  coefficients  leads  to  a  set  of  tractable 
linear  equations  in  the  unknowns,  (2)  there  exists  a  large 
body  of  parameter  estimation  techniques  in  the  literature 
based  on  least  squares,  and  it  is  possible  to  extend  some  of 
these  for  our  model  growth  and  evaluation  problem. 


36 


This  chapter  presents  both  the  theoretical  differences 
and  results  of  computer  simulated  experimental  comparison  of 
various  least  squares  formulations  when  applied  to  systems 
characterization.   In  the  simulation  study,  two  criteria 
used  for  comparison  purposes  are:  (1)  average  squared 
residual  value  (fitting  error),  and  (2)  accuracy  in 
estimating  model  coefficient  values.   An  examination  of  the 
effect  of  additive  output  noise  is  presented  in  the  last 
section . 

An  example  of  a  1 inear-i n-the-coef f icien t s  nonlinear 

difference  equation  using  the  coefficient  notation 

introduced  in  Chapter  II  is; 

y(n)=  9,  ^(1)  u(n)  +Q    ( 1 , 1 )  y(n-1)y(n-1)  +Q   ,(0,1)  u(n)y(n-1) 
i;0  u;z  i,i  ^2.1} 

Eq.  {3.1}  contains  both  linear  and  nonlinear  terms  in  the 
input  u(n)  and  output  y(n).   Since  the  coefficients  Q  j. .  g  all 
enter  in  a  linear  fashion,  this  equation  can  be  expressed  in 
compact  vector  notation  (all  vectors  in  this  thesis  are 
column  vectors).   Defining  a  coefficient  vector  0,  where 


and  a  term  vector  _x(n),  where 

£(n)'^  =  [u(n-l  )  .y(n-1)y(n-1)  ,u(n)y(n-1  )  ] 
Eq.  {3.1}  can  now  be  expressed  in  the  vector  form; 
y(n)  =  Q''-x_(n) 

Assume  that  we  are  given  a  finite  set  of  measurements  of 
the  input  sequence  { u ( n )  ;  S<  =  n<  =  T }  and  the  corresponding 
output  sequence  { y ( n ) ; S<=n< =T}  of  a  time-invariant  and 


(3.2} 


{3.3} 


(3.4} 
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causal  system  of  unknown  structure.   To  reproduce  the  input- 
output  behavior  of  this  system  within  some  moderately  small 
error,  we  choose  a  1 inear-in-the-unknown  coefficients 
predictor  model  equation  of  the  following  form. 

y(n)  =  Q-^x(n)  +  e(n)  {3-5} 

where  e(n)  is  the  equation  error  of  the  model  at  time  n; 
£  is  a  vector  of  length  c  containing  as  yet  unknown 
coefficients  corresponding  to  each  model  term;  and  £(n) 
is  a  vector  of  c  model  terms,  each  of  which  is  formed 
from  the  product  of  a  finite  number  of  input  and  output 
factors  from  the  set: 
[u(n-ra) ,u(n-ra+1) , . . . ,u(n-1) ,u(n) ,y(n-m) , y(n-m+1 ),...,y(n-1)]  (3.6} 

where  m  is  some  finite  number  called  the  memory  (or 
order)  of  the  model.   The  maximum  number  of  input 
factors  or  output  factors  in  any  such  product 
combination  will  be  called  the  degree  d  of  the  model. 

Note  that  the  above  description  fits  the  BVM(d,m)  introduced 

in  Chapter  1 1  . 

The  following  example  is  used  repeatedly  for 

illustration.   Consider  a  linear  MA  model  with  q  =  m  =  2, 

and  the  coefficient  and  term  vectors; 


i  =  ^®l;0^°^'  \;0^'^'^1;Q^''^ 


{3.1} 

£(n)"  =  [u(n)  ,u(n-1 )  ,u(n-2)  ]  (3.3} 

This  model  form  is  linear  in  the  unknown  coefficients 
once  the  x(n)  is  specified,  and  we  choose  to  minimize  the 
following  nonnegative  least  squares  error  criterion; 
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(  "t""?*  ^  ^    "  =  "2 


Z   e(n) 


n 

N  n  =  n 


1  Z^  ^^"^' 


{3.9} 


where  n   and  n   take  on  fixed  integer  values. 

To  carry  out  the  least  squares  fit  in  compact  vector- 
matrix  notation,  we  use  underscored  lower  case  letters  to 
represent  vectors  and  capital  letters  to  represent  matrices. 
Scalars  are  represented  with  lower  case  letters  whenever 
possible,  but  occasionally  capital  letters  are  used. 

Define  the  output  vector  ^,    where 

l^  =    [yCn^)  .yCn^+i) yCn^)]  {3.10} 

and  the  data  measurement  matrix  X,  where 


=  [1^" 


)   £(  n.-t- 1  ) 


x(n  )] 


{3.11} 


Substituting  (3.5},  {3-7},  (3.10},  and  {3-11}  into  (3.9}  yields; 

2  TT  T  Ft  Tt  t  TTI 

J       =2    ^l    -    XO)(y    -    XO)     =   2l^^    "    ®    ^  1    "    '"^^£*  £    X    X9j 

N  ~.  N  "  ""  T3.I2} 

Following  standard  least  squares  theory  [Ref.  11  -  14], 

the  evaluation  equation  for  the  coefficient  vector  0  that 

minimizes  Eq.  {3-12},  and  the  corresponding  value  of  the 

minimum  error  criterion  are  determined.   The  details  of  the 

well  known  least  squares  derivation  are  included  for 

notational  development.   Differentiating  Eq .  {3-12}  with 

respect  to  the  vector  Q  using  matrix  calculus  and  equating 


the  result  to  zero  yields; 

Qj^  =  0  =  -_2  X^2.  *  2  X-'-XO 
^9  N         N 

Simplifying  Eq .  {3.13}  produces; 


{3.13} 


1  x-^xo 


T 
1  X  y 


{3.  14} 
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It  is  convenient  to  use  the  following  compact  notation  for 
this  set  of  c  simultaneous  linear  equations. 

RO   =   r  {3.15} 

{3.  16} 


where   R 


T 
1  XX 


N 


is  the  positive  seni-def ini te  least  squares  matrix  of  size  c, 
and     r   =   1  X*y  {3.17} 


T 
1    X  Z 
N 


is  a  column  vector  of  size  c.   The  factor  1/N  is  retained  in 
these  definitions  for  subsequent  distinction. 

To  insure  that  Eq  .  {3-13}  represents  a  unique  minimum, 

2 
the  second  derivative  of  J   with  respect  to  0  must  be 

positive.   Applying  this  result  to  Eq  .  {3-12}  yields  the 

added  condition  that; 


dh^ 


J~   =  2   [  X'^X  ]  = 


2R   >   0 


(3.18} 


Equations  {3.15}  are  known  as  the  normal  equations,  and 
there  is  a  unique  solution  if  and  only  if  matrix  R  is 
positive  definite.   This  unique  solution  is; 
9 


R-^r 


{3.  19} 


Using  {3.  16},  {3.17}.  and  {3-19}  with  {3-12},  the  minimum 


value  of  J   is; 

j2  =  1  v^y 


r  r"-^  r 


1  1^1      -      iJi. 


{3.20} 
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We  now  investigate  how  the  effect  of  the  choice  of  the 

values  of  n   and  n   in  Eq.  (3.9},  changes  the  resulting 

2 
value  of  the  error  criterion  J  .   The  main  purpose  of  this 

investigation  is  to  identify  the  reasons  for  the  observed 

numerical  differences  in  the  results  of  various  least 

squares  formulations  that  commonly  are  used  in  the 

literature.   Many  recent  researchers  [Ref.  1  -  9,  13,  1^, 

and  18  -  291  put  emphasis  on  computational  simplicity  and 

make  assumptions  or  approximations  related  to  the  values  of 

n„  and  n.  that  induce  special  structure  into  the  solution 
2       3 

equations  {3-19}  for  Q.   We  consider  a  number  of  distinct 
cases  that  are  discussed  but  not  clearly  compared  in  the 
literature . 

(1)  If  n  <S+m  and  n  <=T,  this  is  equivalent  to  the 
assumption  that  u(n)=0  for  n<S  and  y(n)=0  for  n<S, 
and  is  known  in  the  literature  as  the  " Pr ewindowed " 
case  [Ref.  U  ,  6  ,  28  and  291  . 

(2)  If  n  >=S+ra  and  n  >T,  this  is  equivalent  to  the 
assumption  that  u(n)=0  for  n>T  and  y(n)=0  for  n>T, 
is  known  in  the  literature  as  the  "Postwindowed" 
case  [Ref.  29],  and  is  seldom  used. 

(3)  If  n  <S+ra  and  n  >T,  we  get  both  prewindowing 
and  postwindowing  since  this  is  equivalent  to 
assuming  that  u(n)=0  for  n<S  and  n>T,  and  also 
that  y(n)=0  for  n<S  and  n>T.   This  is  equivalent 

to  rectangularly  windowing  the  measurements.   It  is 
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known  as  the  "Autocorrelation"  case  [Ref.  7,  14,  25, 
and  28  -  30],  and  is  the  typically  used  method. 
(4)  If  n  >=S+ra  and  n  <=T,  no  window  is  applied 
to  the  observed  measurements,  and  the  so  called 
"Covariance"  case  is  realized  [Ref.  28,  29  and  31]. 
Depending  upon  the  specific  choice  of  n   and  n  ,  there  are 

many  different  least  squares  error  criterion  values 

2 

J  (n  ,n-),  and  related  model  coefficient  estimates  £(n  ,n  ), 

for  a  given  set  of  input-output  data.   The  literature 
typically  reports  the  use  of  the  Autocorrelation  method  for 
statistical  considerations  and  because  this  can  often  lead 
to  an  efficient  solution  algorithm.   This  point  is  discussed 
further  in  a  subsequent  section. 

Examination  of  these  four  different  methods  from  the 
unifying  framework  of  the  least  squares  equation  {3-9}  i 
reveals  an  interesting  comparison  basis  for  explaining  the 
subsequent  differences  in  form  and  performance.   This 
development  does  not  appear  in  the  systems  identification 
literature  and  clearly  indicates  which  error  minimization 
method  should  be  used  for  the  performance  modeling  approach 
to  the  general  model  growth  problem.   The  main  result  is 

that  the  Covariance  method  generally  gives  superior  modeling 

2 

results  in  terms  of  lower  fitting  error  J   and  more  accurate 

model  coefficients  in  the  vector  £.   The  differences  in 
these  four  methods  are  described  analytically  in  terms  of 
the  following  example,  generalized  in  the  theorem  that 
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follows,  and  finally  demonstrated  in  a  computer  simulated 

experiment . 

EXAMPLE:   Let  S  =  1  and  T  =  10.   Then  we  have  the  data 

{u(n)}  =  {u(1),  u(2),  u(3) u(10)}  (3.21} 

{y(n)}  =  {yd),  y(2).  y(3) y(10)}  {3-22} 

Let  the  model  be  given  by  the  equation: 

y(n)  =  9,.-(0)  u(n)  +©    ( i )  u(n-1)  +9,  ^(2)  u(n-2)  +  e(n)   (3.23} 
x,u  XjU  i»0 


Using  least  squares 


J   =  J.   Z     e(n)' 
N   nrn^ 

where  N  =  n  -n  +1,  and  where  the  coefficient  vector  is; 

£^  =   [  9i;o<°>-  «l;o<^'-  ^:o<2)  1 

leads  to 


{3.24} 


1  [  X  X  ]  9 


T 
J.  X  Z 

N 


{3.25} 


{3.26} 


where 

X^  =  Cy( 1)  .y(2)  .y(3) y(io)]  {3.27} 

and  X  is  the  M   x  3  data  matrix  involving  {u(n)},  whose 
contents  depends  upon  the  choice  of  n   and  n   as  shown  in 
the  following  four  cases. 
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Case  1 :   Prewindowed  method 

Here  n2  =  S  =  1,  n   =  T  =  10,  and  the  data  matrix  becomes; 

u(  1 )     0      0 

u(2)  u(  1)     0 

u(3)  u(2)  u(1) 

u(4)  u(3)  u(2) 

u(5)  u(4)  u(3)  {3.28} 

u(6)  u(5)  u(4) 

u(7)  u(6)  u(5) 

u(8)  u(7)  u(6) 

u(9)  u(8)  u(7) 

u(  10)  u(9)  u(8) 

The  solution  of  the  normal  equations  {3.26}  is  now  given  by; 


^;o"" 


«l;0<2) 


10 


10  n=  1 


10  I      10 

J_  r  u(n)u(n-1  )  I  _1_  y  u(n)u(n-2) 
10  n=2  !  10  n=3 


f 7 


10 
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{3.29} 


Mote  that  the  square  matrix  in  Eq.  {3-29}  has  different 
summation  limits  along  each  diagonal  parallel  to  the  main 
diagonal  . 
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Case  2:   Postwindowed  method 


Here  n^  =S  +  m=1+2  =  3,  n_  =T  +  rn=  1 0  +  2=  1 2  ,  and  the  data  matrix  becomes; 

"(3)  u(2)  u( 1 )' 

u(4)  u(3)  u(2) 

u(5)  u(4)  u(3) 

u(6)  u(5)  u(U)  (3.30} 

u(7)  u(6)  u(5) 

u(8)  u(7)  u(6) 

u(9)  u(8)  u(7) 

u(10)  u(9)  u(8) 

0  u(  10)  u(9) 

0      0  u  (  1 0 )_ 

The  solution  of  the  normal  equations  (3.26}  is  now  given  by; 
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Note  that  the  square  matrix  in  Eq.  {3-31}  has  a  different 
set  of  summation  limits  along  each  diagonal  parallel  to  the 
main  diagonal. 


45 


Case  3:   Autocorrelation  method 


Here  n2  =S=1,  n-  =T+ra= 1 0+2= 1 2 ,  and  the  data  matrix  becomes 

"u  (  1  )     0  0  ' 

u(2)  u(  1)  0 

u(3)  u(2)  u(  1  ) 

u(U)  u(3)  u(2) 

u(5)  u(4)  u(3) 

u(6)  u(5)  u(4)                           {3.32} 

u(7)  u(6)  u(5) 

u(8)  u(7)  u(6) 

u(9)  u(8)  u(7) 

u(  10)  u(9)  u(8) 

0  u(  10)  u(9) 

_  0      0  u( 10) 

The  solution  of  the  normal  equations  {3.26}  is  now  given  by; 
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Note  that  the  square  matrix  in  Eq  .  {3-33}  is  symmetric, 
Toeplitz  (equal  values  along  every  diagonal  parallel  to  the 
main  diagonal) ,  and  the  summation  limits  are  all  the  same 
along  any  diagonal  parallel  to  the  main  diagonal. 
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The  particular  structure  of  the  symmetric  Toeplitz 
matrix  in  Eq.  {3-33}  was  developed  here  strictly  from  a 
consideration  of  the  error  minimization  limits.   The 
literature  contains  numerous  references  to  least  squares 
matrices  with  this  special  structure,  but  it  is  usually  just 
stated  or  developed  along  different  lines"^.   After 
presentation  of  the  fourth  case,  we  will  discuss  the 
implications  of  each. 


3   For  example.  Baheti  [Ref.  23  and  24],  Hsia  [Ref.  14], 
and  Iserman  [Ref.  131  all  utilize  what  they  call 
"correlation  analysis"  where  they  assume  that  the  input  and 
output  sequences  are  ergodic,  such  that  this  special 
Toeplitz  structure  results.   This  "ergodic  assumption"  can 
be  described  mathematically  as  follows  [Ref.  14,  pp.  44]. 
Consider  a  finite  length  discrete-time  sequence  of 
measurements  of  some  signal  denoted  by  {s(n)}.   If  this  is  a 
representative  sample  of  an  ergodic  process,  then  the 
following  condition  will  hold.   The  value  obtained  from  the 
expression ; 

i  +  N 
1     V   3 ( n ) s ( n-j ) 
N+1   n=i 

is  invariant  with  respect  to  the  integer  i.   If  this  special 
condition  holds,  or  is  assumed,  then  the  Toeplitz  structure 
of  Eq.  {3.33}  will  result  because  of  the  relationships; 

i+N  i+N  i+N+1 

1   y    s(n)s(n-j)  =_1_  ^  s(n+l  )s(n+1-j)  =__!_  ^    s(n)s(n-j) 
M+1  n=i  N+1  n=i  N+1  n=1+1 

i+N  i+N+2 

=_ 1   y  s(n+2)s(n+2-j)  =  1    J]    s(n)s(n-j) 


N+1  nri 


N+1  n=i+2 
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Case  U:   Covariance  method 


Here  n^  =S+m=1+2=3.  n^  =T=10,  and  the  data  matrix  becomes; 

u(3)  u(2)  u(1) 

u(4)  u(3)  u(2) 

u(5)  u(4)  u(3) 

u(6)  u(5)  u( 4)                           {3 .34} 

u(7)  u(6)  u(5) 

u(8)  u(7)  u(6) 

u(9)  u(8)  u(7) 

u( 10)  u(9)  u(8) 


The  solution  of  the  normal  equations  (3.26}  is  now  given  by; 
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Note  that  the  square  matrix  in  Eq .  {3-35}  is  symmetric  but 
not  Toeplitz,  and  the  summation  limits  are  all  the  same. 

The  main  reason  for  the  preceding  four-case  development 
is  to  point  out  the  specific  condition  under  which  the  least 
squares  matrix  becomes  Toeplitz.   This  property  is  exploited 
in  Levinson's  algorithm  [Ref.  1],  which  solves  the  normal 
equations  with  order  of  complexity  proportional  to  the  size 
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of  the  least  squares  matrix  R,  rather  than  proportional  to 
the  cube  of  the  size  of  this  matrix  as  occurs  in  the  other 
three  cases  shown.   Details  of  this  algorithm  are  discussed 
later  in  Chapter  V  and  Appendix  B.   For  models  other  than 
simple  moving  average,  other  researchers  [Ref.  3-9  and  21 
-  25]  constrained  their  model  forms  and  used  the 
Autocorrelation  method  to  form  least  squares  matrices  with 
Toeplitz  principle  submatrices,  and  therefore  gain  some 
computational  advantage  when  solving  these  equations  using 
variations  of  Levinson's  Algorithm.   This  chapter  proves 
that  this  technique  is  cumbersome,  unnecessary,  and  more 
importantly,  generally  produces  i  nf er i  or  models  compared  to 
those  obtained  by  the  Covariance  method. 

The  constrained  Autocorrelation  method  models  are 
inferior  in  two  main  ways:  (1)  Only  specifically  related 
sets  of  model  terms  necessary  for  the  special  Toeplitz 
structure  are  allowed  in  the  model.   This  limits  model 
growth  flexibility,  increases  the  computational  burden,  and 
degrades  the  model  performance.   Further  discussion  on  these 
points  is  given  in  Chapter  V.   (2)  The  particular  choice  of 
data  interval  described  by  the  Autocorrelation  least  squares 
method  (or  its  statistical  equivalent)  typically  produces  a 
model  with  significantly  higher  fitting  error,  and 
substantially  larger  coefficient  error  than  the  Covariance 
method.   This  key  point  is  discussed  in  more  detail  in  the 
next  two  sections. 


49 


B.   A  THEOREM  DESCRIBING  THE  CONDITION  FOR  SUPERIOR 

PERFORMANCE  OF  THE  COVARIANCE  METHOD 

The  four  previous  methods  use  exactly  the  same  form  of 
computation;  they  differ  only  in  the  specific  data 
measurements  used.   The  Prewindowed,  Post  windowed ,  and 
Autocorrelation  methods  supply  missing  zeros,  either  before 
or  after  the  measured  data,  or  both.   Thus  these  methods  are 
arithmetically  equivalent  to  the  Covariance  method  operating 
on  a  discontinuous  function,  and  it  is  well  known  that  it  is 
hard  for  least  squares  or  any  other  minimization  method  to 
handle  discontinuous  functions.   An  alternate  explanation  is 
that  the  first  three  methods  utilize  constraints  on  the  data 
values,  and  it  is  generally  found  that  a  constrained 
solution  is  inferior  to  the  optimum  (minimum  valued) 
solution.   This  suggests  that  the  Prewindowed,  Postwindowed , 
and  Autocorrelation  methods  would  be  inferior  to  the 
Covariance  method. 

Simple  computer  simulation  experiments  given  in  the  next 
section  confirm  this  reasoning.   It  remains,  therefore,  to 
mathematically  express  this  feeling  and  these  results  that 
supplying  missing  data  by  a  run  of  zeros  is  a  poor  method  to 
use.   We  are,  of  course,  concerned  with  the  quality  of  the 
fit,  the  sum  of  the  squares  of  the  residuals. 

The  first  step  is  to  examine  under  what  circumstances 
the  Prewindowed,  Postwindowed,  or  Autocorrelation  methods 
would  produce  a  lower  average  error  than  the  Covariance 
method.   Some  mathematical  notation  is  needed. 
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Consider  a  finite  set  of  dynamic  input  observations 
{ u( n )  ;  S<  =  n<  =  T}  and  corresponding  output  observations 
{ y ( n ) ; S<=n<=T }  of  some  system,  and  a  linear-in-the- 
coefficients  model  equation  relating  the  present  value  of 
y(n)  to  functions  of  past  values  of  y(n)  and  present  and 
past  values  of  u(n).   Denote  the  integer  m  as  the  maximum 
discrete  lag  (order)  of  term  of  the  model  equation,  and  9  as 
the  coefficient  vector.   The  model  equation  can  be  written: 

y(n) =f [£.u(n-i )  ,y(n-j ) ; i=0 , 1 ,2 , . . . ,m; j  =  1 ,2 ra]  +e(n)  {3-36} 

2 

Let  {e.(n)}  represent  the  error  residual  and  J 

represent  the  average  squared  error  obtained  when  a  least 
squares  minimization  is  performed  over  the  interval  (n  ,n-). 


N.    nrn^ 


e^(n)' 


where  n   =  S+ra 

and  n   =  T 

and  N   =  n   -  n.  +  1 


{3.37} 

{3.38} 
{3.39} 

{3.40} 


Let  the  length  of  the  error  minimization  interval  be 

increased  by  "a  small  amount  N->0  to  a  larger  region  (n  ,n  ) 

2  14 


where  n,<n-  and/or  n  >n 
11  4   J 


This  new  region  includes  the 

first  region  and  available  data  points  on  either  or  both 

sides  of  the  first  region.   Missing  data  points  in  the  new 

data  matrix  X  are  padded  with  zero  values.   Using  the  same 

model  form,  perform  a  least  squares  minimization  over  the 

interval  (n  ,n  ).   Denote  the  new  error  residual  as  {e  (n)} 
14  ^ 

2 
and  the  average  least  squares  error  as  J  . 
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N  +N     n=n 


I     e.(n) 


-J L     ^^(n)  -K  J]     e^Cn)  +  J]     e^Cn) 

l^  +  N^  L  n  =  n  n  =  n  n  =  n^  +  1        -• 


where  N^  =  (n,-n.+1)  -  N, 

2 
Since  J   is  the  least  squares  fit  over  (n  ,n  ),  it  must 

be  less  than  or  equal  to  the  quantity; 


(3.41} 
(3.42} 


N^  nm^  ^ 


{3.43} 


Let  E   be  the  nonnegative  value  representing  this  loss  of  fit. 

n 


y^  fe  (n)^-  e.(n)^] 

N,  n-^n  L  2        1     J 
1     2 


{3.44} 


THEOREM  1  : 


A  necessary  and  sufficient  condition  for 

2     2 
'l    <  ^1 

is  that  the  following  condition  must  be  met: 


{3.45} 
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PROOF: 


{3.46} 


Substituting  {3-37},  {3-41},  and  {3.44}  into  {3.45}  yields: 
rn,  -1 


.+N.L!^  =  n-,  n  =  n  n  =  n+1       J    Nn  =  n 


3        2    2 

e^Cn)  -  E^ 


{3.47} 


Multiply  by  N  +N   and  transpose  the  middle  term  from  the  left 
^'12 


n=n,         n=n^+1  L   N,       J  n=n_ 


{3.48} 
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Dividing  both  sides  by  N   yields  our  desired  condition. 
N2  L  ntn^   ^      niT?^  +  l    ^     J     N^  n  =  n2  ^  L  N^     J 


E 
{3.46} 


It  is  logical  to  ask  how  the  condition  of  Eq  .  {3.46} 
could  arise.   The  condition  states  that  the  average  fit  over 
the  added  end  regions  must  be  less  than  the  average  fit  over 
the  middle  region  minus  the  last  term  on  the  right  side. 
Since  N.>>N2,  the  last  term  on  the  right  side  of  Eq  .  {3.46} 
would  be  significant,  and  the  error  of  the  end  regions  must 
be  much  smaller  than  the  average  error  over  (n  ,n  )  for  Eq . 
{3.46}  to  hold.   Two  obvious  special  cases  can  arise  that 
satisfy  the  condition  of  Eq.  {3.46}. 

[1]  In  the  case  where  the  forced  zero-valued  data 
points  in  the  expanded  region  correspond  to  the  actual 

input  sequence  and  the  natural  system  dynamics  (and  no 

2 
noise),  then  it  follows  that  E   in  {3.44}  will  be  zero. 

As  an  example,  consider  a  stable  system  excited  by  the 

following  waveform  {u(n)}. 

A 
u(n) 


""1    "2 
The  output  {y(n)}  might  have  a  similar  shape 

y(n) 


»  n 


^3  "4 
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In  the  preceding  figure,  the  expanded  data  region 

contains  actual  zero-valued  input-output  values,  e-(n) 

2      2 
will  be  zero  in  the  expanded  regions,  and  J^  <=  J  . 

2      1 

[2]  If  the  first  region  (n  ,n  )  contains  data  values  that 
don't  exactly  fit  the  model  equation,  and  the  additional 
measurements  in  the  larger  region  (n  ,n,)  happened  to 
contain  data  that  exactly,  or  almost  exactly  fit  the 
model  equation,  then  the  average  error  over  the  larger 
region  could  be  lower. 
Both  of  these  special  cases  are  possible,  but  it  appears 
highly  unlikely  that  either  will  occur  in  practice.   The 
special  requirements  on  the  data  sequences  for  these  cases 
are  examples  of  pathological  situations.   The  probability  of 
their  occurrence  is  so  small  as  not  to  be  meaningful. 

The  value  of  Theorem  1  resides  not  in  the  elegance  of  a 
mathematical  proof,  but  because  its  proof  is  so  simple  and 
its  meaning  so  important.   Theorem  1  basically  proves  that 
any  least  squares  error  minimization  method  other  than  the 
Covariance  method,  will  produce  a  higher  average  fitting 
error  in  all  but  unlikely  pathological  cases.   Therefore, 
any  systems  characterization  or  parameter  estimation 
technique  based  on  a  least  squares  minimization  method 
different  than  the  Covariance  method  (e.g.  Prewindowed, 
Pos twindowed  ,  Autocorrelation,  etc)  will  generally  produce 
suboptimal  fitting  error  results.   This  result  is  important 
for  the  work  that  follows,  since  it  is  well  known  that 
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approximations  made  early  in  certain  recursive  algorithms 
often  grow  and  lead  to  significant  errors  later  on,  and  we 
want  to  use  a  recursive  algorithm  to  efficiently  evaluate 
the  model  growth . 

The  next  section  provides  some  computer  simulated 
experimental  verification  of  the  results  of  this  section. 
Other  factors  affecting  the  accuracy  of  systems 
characterization  and  parameter  estimation  are  also  examined. 

C.   SIMULATION  EXPERIMENTS 

Experiment  1 : 
DESCRIPTION:   An  investigation  of  the  effects  of  various 
least  squares  error  methods,  and  the  length  of  the  observed 
data  [S<=n<rT],  on  the  accuracy  of  the  characterization  of 
known  typical  linear  and  nonlinear  systems. 

CRITERION:   Square  root  of  the  average  sum  squared  fitting 

2 
error,  J.   Note  that  we  minimize  J   but  examine  J.   This  is 

done  for  clarity  of  graphical  presentation. 

For  the  first  part  of  the  experiment,  we  synthesize  the 

MA( 3 )  system ; 

y(n)  =  l.Ou(n)  +  .8u(n-1)  +  .6u(n-2)  -  .3u(n-3)         (3.49} 

The  following  Test  Procedure  is  used  repeatedly. 

TEST  PROCEDURE: 

Generate  a  random  sequence  for  {u(n)},  uniformly 
distributed  between  the  amplitude  limits  [-5,51,  and  start 
this  input  through  the  system  (with  zero  initial  conditions) 
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at  discrete  time  n=1.   Record  the  observations 
{u(n) ;S<  =  n<  =  T}  and  { y ( n )  ;  S<=n<  =  T}  for  S  and  T  specified 
below,  and  use  thera  to  minimize  the  least  square  equation 
error  of  {3-12}.   Examine  the  use  of  the  Prewindowed  (P), 
Postwindowed  (W),  Autocorrelation  (A),  and  Covariance  (C) 
methods.   The  value  of  S  is  chosen  to  be  11,  and  T  varies 
from  50  to  1000  in  steps  of  50.   The  experiment  is  carried 
out  over  an  ensemble  of  ten  (10)  runs,  with  different,  but 
equivalently  distributed,  random  input  sequences.   For  the 
data  obtained  from  the  ensemble  of  ten  runs,  plot  the 
maximum,  minimum,  and  average  value  of  J,  as  a  function  of 
the  value  of  T  and  of  the  choice  of  minimization  method. 

For  the  second  part  of  the  experiment,  synthesize  the 
following  ARMA(2,2)  system  and  repeat  the  test  procedure. 
y(n)  =  I.Ou(n)  +  .8u(n-1)  +  .6u(n-2)  -  .9y(n-1)  -  .7y(n-2)   {3-50} 

For  the  third  part  of  the  experiment,  synthesize  the 
following  BVM  system  and  repeat  the  test  procedure. 
y(n)  =  I.Ou(n)  +  .8u(n-1)  +  .6u(n-2)  -.9y(n-1)  -  .7y(n-2) 
+  .2u(n)u(n)  +  . 1 5u ( n- 1 ) u ( n-4 )  -  . 3y ( n-2 ) y ( n-4 ) 
-  .16  u(n-1)y(n-1)  +  . 05u ( n-2 ) y ( n-4 )  {3-51} 

Figures  4  through  7  present  the  maximum,  minimum,  and 
average  values  of  J  versus  T  and  the  choice  of  the  least 
squares  error  minimization  method  for  the  MA(3)  model  of  Eq. 
{3.49}.   As  expected,  the  A,  P,  and  W  methods  show  improved 
performances  with  increasing  T,  but  even  at  T=1000,  these 
methods  are  significantly  inferior  to  the  C  method. 
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It  is  conjectured  that  the  slight  increase  in  the 
average  value  of  J  as  T  increases  with  the  Covariance 
method,  is  due  to  the  finite  precision  of  the  computer  used 
for  these  experiments.   The  experiments  could  be  repeated 
using  double  precision  variables  in  an  attempt  to  verify 
this  conjecture.   We  are  actually  approximating  the  N 
equations  XO  =  y^  for  the  4  unknowns  _0.   Since  there  are  many 
more  measurement  equations  than  constraint  equations,  it  is 
natural  that  the  average  error  should  be  slightly  higher  as 
T  (and  therefore  N)  gets  larger. 

Figure  8  shows  how  the  choice  of  the  four  error 
minimization  methods  affect  the  matrices  and  vectors 
involved  in  the  evaluation  of  the  MA(3)  model,  for  different 
values  of  T.   Note  that  the  R  matrix  and  r_   vector  have  been 
normalized  by  dividing  each  element  by  the  first-row,  first- 
column  entry  of  R".   This  does  not  affect  the  answer  and  it 
provides  for  an  easier  comparison  of  the  twelve  cases  shown. 

Since  the  Covariance  method  uses  only  the  exact  data 
measurements,  we  denote  as  exact  the  values  of  R  and  £  in 
the  Covariance  method  of  Figure  8.   The  corresponding  matrix 
and  vector  in  the  other  three  methods  can  therefore  be 
considered  to  have  errors.   The  important  thing  to  recognize 
here  is  that  errors  in  the  third  decimal  place  in  the  values 
of  the  R  matrix  in  these  other  methods,  translate  to  more 
significant  errors  in  the  inverse  of  R,  and  ultimately  into 
substantial  errors  in  the  estimates  of  J  and  0. 
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Figure  8:  CONTENTS  OF  MATRICES  AND  VECTORS  UNDER  VARYING 
CONDITIONS.  FOR  A  MAO)  MODEL  AND  SYSTEM        (Note:  Matrix  R  and 
vector  r  have  been  normalized  by  the  first  row,  first  column  entry  of  R.) 
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While  we  are  interested  in  determining  the  least  squares 
method  that  provides  the  minimum  J,  we  have  also  evaluated 
the  typical  offset  in  the  coefficient  estimates  that  result 
from  the  use  of  these  four  methods,  and  present  this 
information  in  Figures  9  through  12.   The  erratic  behavior 
of  the  A  method  in  estimating  the  coefficient  values  appears 
to  be  the  result  of  the  padded  zeros  on  both  ends  of  the 
data  matrix  X.   The  appearance  of  similar  curves  in  Figures 
9  through  12  for  the  A  and  W  cases  can  be  explained  as 
follows.   Both  the  Autocorrelation  (A)  and  Postwindowed  (W) 
cases  have  padded  zeros  at  the  bottom  end  of  the  matrix  X 
given  by  Eq.  (3.30}  and  {3-32}  respectively.   The  effect  of 
the  padded  zeros  in  the  Autocorrelation  case  X  matrix 
decreases  as  N  gets  large,  and  the  A  and  W  cases  approach 
each  other  in  this  limit. 
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Figures  4  through  12  show  the  significant  differences 
resulting  from  the  choice  of  error  minimization  method. 
This  choice  directly  affects  the  contents  of  R  and  r,  which 
in  turn  affects  Q  and  therefore  J.   A  logical  conjecture  is 
that  the  condition  number  (ratio  of  largest  to  smallest 
eigenvalue)  of  the  matrix  R,  could  be  a  good  indicator  of 
the  quality  of  the  least  squares  fit.   In  other  words,  the 
more  well  conditioned  the  matrix  (lower  condition  number), 
the  lower  the  corresponding  fitting  error  J.   While 
esthetically  pleasing,  this  conjecture  is  not  born  out  by 
experience.   In  over  30  cases  of  linear  and  nonlinear 
systems  modeled  using  each  of  these  four  error  minimization 
methods,  the  corresponding  R  matrices  were  all  well 
conditioned  (low  condition  number),  there  was  no  significant 
difference  in  condition  number  between  the  four  methods,  and 
there  was  no  direct  correlation  between  lowest  condition 
numbers  and  lowest  fitting  error  J.   Condition  number  data 
is  included  in  the  typical  results  of  Figure  8. 

This  is  explained  by  the  following.   The  fit  J  is  a 
function  of  the  entire  coefficient  vector  9  and  the  vector 
r.   Since  the  coefficient  vector  £  is  a  function  of  both  the 
matrix  R  (actually  the  inverse  of  R)  and  the  vector  r_,  the 
condition  number  of  R  is  an  insufficient  measure  of  the 
accuracy  of  Q,  and  therefore  is  an  insufficient  measure  of 
the  quality  of  the  fit  J. 
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Figures  13  through  16  present  the  results  of  the 
experiment  for  the  ARMA(2,2)  model  of  Eq .  (3.50}.   Figures 
17  through  20  present  the  results  of  the  experiment  for  the 
BVM(2.4)  model  of  Eq.  (3.51}.   Both  of  these  sets  of  figures 
indicate  the  superior  performance  of  the  Covariance  (C) 
method  in  minimizing  the  fitting  error  criterion  J. 
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This  experiment  has  lead  to  a  greater  understanding  of 
the  accuracy  of  the  different  minimization  techniques  with 
respect  to  the  size  of  the  observation  sequences  for  a 
nonrecursive  model,  a  recursive  model,  and  a  nonlinear  model 
of  the  BVM  form.   For  the  models  examined,  the  Covariance 
least  squares  error  minimization  method  is  superior  to  the 
Prewindowed,  Post  windowed ,  and  Autocorrelation  methods. 

These  results  are  representative  of  those  obtained  with 
other  system  equations.   The  conclusion  to  be  drawn  from 
simulation  Experiment  1  is  that  the  Covariance  method  is  the 
most  accurate  of  the  four  methods.   Since  our  primary 
problem  is  accurately  characterizing  systems  whose  exact 
mathematical  form  is  unknown,  the  Covariance  method  is 
adopted  for  the  rest  of  the  work  in  this  thesis.   This 
avoids  introducing  offset  errors  in  J  and  _Q  that  might  give 
misleading  results  later  in  our  model  growth  techniques. 

The  next  section  examines  another  factor  that  can  affect 

2 
the  estimates  of  J   and  Q;  output  measurement  noise.   This 

is  included  for  convenience  at  the  present  time,  and  is 

referred  to  again  later  on. 
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D.   EFFECTS  OF  OUTPUT  MEASUREMENT  NOISE 

If  the  system  output  is  contaminated  with  additive  noise 

|v(n)|,  then  the  evaluation  of  the  model  and  the  estimates 

4 

of  the  model  coefficients  may  be  affected  .   If  the  additive 

noise  is  uncorrelated  with  the  system  input,  and  the  model 
is  nonrecursive  (e.g.  no  ©Q.q  or  Qp-q  terms  in  a  BVM  which 
thereby  reduces  to  a  MA  or  VOL  model),  then  the  effect  of 
the  additive  noise  on  the  coefficient  estimates  will 
generally  be  small.   This  effect  approaches  zero  in  the 
limit  as  the  size  of  the  data  segment  (T-S  +  1  )  gets  large. 
This  property  of  a  nonrecursive  model  is  well  known  in  the 
literature  [Ref.  13  and  Ref.  14,  pp  41  and  pp  144]. 

To  better  understand  the  effect  of  uncorrelated  additive 

5 
output  noise  in  the  case  of  a  linear  recursive  model  , 

denote  the  noisy  output  sequence  as  {z(n)|; 

z(n)   -   y(n)  -^  v(n)      for  all  S<  =  n<=T  |3.52| 

To  utilize  the  equation  error  minimization  techniques 
discussed  at  the  beginning  of  this  chapter,  we  substitute 
z(n)  for  y(n)  in  the  evaluation  equations,  and  Eq .  {3.5J 
becomes ; 

z(n)  =  S^  x(n)  *  e(n)  |3.53| 


4  Other  measurement  noises  such  as  additive  input  noise 
or  multiplicative  input  and/or  output  noise  are  also 
possible,  but  are  not  considered. 

5  The  following  analysis  holds  for  linear  recursive  or 
nonrecursive  models  ''e.g.  ARM  A).   There  appears  no  tractable 
way  to  extend  it  to  recursive  nonlinear  models  (e.g.  BVM). 
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where  under  the  condition  of  additive  output  noise,  9^  is  the 
coefficient  vector,  and  x^(n)  is  the  corresponding  term 
vector  based  on  u(n)  and  z(n),  instead  of  u(n)  and  y(n). 
The  "*"  is  used  to  denote  factors  affected  by  the  noise. 
x(n)  =  x(n) 

!y(n)|  =  {z(n)] 


=  X 


(n) 


+  x(n) 


u(n) !  =  0 
y(n)j  =  |v(n)| 


ly(n)|  =  {y(n)i 
=  x(n)    +    x^(n)  I3.54I 

Note  that  if  the  model  is  nonrecursive ,  x^^(n)  =  0_, 

and  ^(n)  =  x_(n).   A  more  interesting  example  is  as 

T 
follows;  If  _x(n)   =  [  u(  n)  ,  u  (  n- 1  )  ,  y  (  n- 1  )  ],  then 

x(n)  =  [  u(n)  ,u(n-1  )  ,zCn-l)1  =  [  u(  n)  ,  u(  n- 1  )  ,  y  (  n- 1  )  +  v(  n-1  )  ] 

T 
and  x^Cn)  =  [  0,0,v(n-l)  ] 

Substituting  Eq .  {3.53!  and  Eq .  13.54!  into  Eq .  [3.9!, 

and  minimizing  with  respect  to  9  by  matrix  calculus,  yields 


the  least  squares  solution  equations; 


2.  [  X  X  J  £ 
N 


J_  X  z_ 

N 


where  z_^=  rz(n2),z(n-  +  l),...,z(n~)] 
and 


X*^  =  [  xCn^)   l^n^^l  )   ...   XU3)  ] 
Substituting  Eq.  13.54-!  and  {3.57!  into  |3.55!  yields; 
[  R  -  H^]  I  =  r   -  r^ 


I3.55! 
13.56! 

13.57! 

3.58! 


where  the  following  matrices  and  vectors  are  defined; 

R   =   1  X-'^X  l3.59^ 


N 
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=  1  ^v  ^v 
N 


T 


V 

r 


1(^3)  ] 


T 

N 


Zv  ^    L    '^v   1 

N 

v*^  =  [v(n^)»  v(n^+^')  ,  .  .  .    vCn^)] 
Solving  {3-58|  for  the  model  coefficient  vector  £; 

i   =   [  R   *  Rv  ]   [  £   *  Zv  ^ 
The  first  term  on  the  right  side  of  Eq.  |3.66|  can  be 

simplified  when  the  inverse  exists. 

[  R   +  R^  ]'   =  [  R  [  I  +  r'^R^  ]] 

=  [i  .  r'^r^I'^r"^ 

Substituting  {3»67|  into  |3.66|  and  using  9_  =  R   £  from 
Eq.  I3.19I  yields; 
9 


h.6ol 
h.6i| 

h.62l 
l3.63| 

|3.64| 

13.651 

13.661 


i3.67| 


[  I  >  R    R^]    R      r 


r  ■•■  r 
—    — V- 


-L   ,-1  - 


[  I  -^  R   R^l   R'^^r   +   [  I  +  R'^^'Ry]    R'""" 

p        -1   t"1        r        -1   T'l   -1 

[  I  >  R   R^]   9   >   [  I  +  R   R^]    R  "-r^ 


£v 


l3.68| 


Note  that  when  the  measurement  noise  | v( n) ; S < =n<T |  is  equal 
to  zero,  R^  reduces  to  the  null  matrix,  v_       reduces  to  the 
null  vector,  and  Eq.  {3. 68]  yields  9_  =  9_.   We  are  interested 
in  the  noisy  measurement  case,  and  denote  the  distortion  in 
the  model  coefficients  as  9,,  where 


id  =  i  -  i 


-d 

|3.69| 

Substituting  Sq.  |3.69l  into  Eq .  |3.68|  yields  an  expression 
for  this  distortion  in  the  model  coefficients. 
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9^  =   C  I  +  R   R^]   0   +   [  I  +  R  ^R^]   R  ^r^   -  0 

-1    -1        -1    -1 
=   [  I  +  R   R^]   [  Q  +  R   r^]    -  0  (3.70} 

Eq.  {3.70}  gives  an  exact  expression  for  the  coefficient 

distortion  due  to  additive  output  noise,  but  its  meaning  is 

hard  to  appreciate  directly  because  of  the  four  inversions. 

The  first  term  on  the  right  side  can  be  expanded  in  a 

geometric  series; 

-1    -1        -1         -1    +2      -1    +3 

[  I  +R   R^]   =  I  -  R  "-R^  +  [  R   R^]   -  [  R   R^]   +  ...    {3-71} 

This  series  is  valid  when  the  absolute  values  of  the 
eigenvalues  of  matrix  [  R   R^j  are  all  less  than  1.   Matrix 
powers  greater  than  one  are  negligible  when  the  eigenvalues 
are  small  compared  to  one.   These  conditions  are  met  when 
the  total  power  of  the  additive  noise  is  small  in  comparison 
to  the  total  power  of  the  system  output;  i.e.  high  SNR. 
Using  this  assumption,  Eq .  {3.71}  is  approximated  by  the 
first  two  terras  of  the  expansion,  and  Eq.  (3-70}  becomes; 


Hd 


[  I  -  R'^R^]  [  0  +  r"  r^]  -  0 

=   R"-^  [  [  I  -  RvR''^^Iv  -  ^v-^  {3.72} 

The  above  equation  can  be  interpreted  as  describing  the 
model  coefficient  distortion  vector  as  composed  of  the 
difference  of  two  vectors.   One  is  a  constant  term,  and  the 
other  is  a  multiplicative  function  of  the  noise-free  model 
coefficients.   Note  that  the  only  inversion  needed  for  this 
approximation  is  that  of  the  matrix  R.   Also,  both  vectors 
are  directly  proportional  to  the  inverse  of  the  matrix  R, 
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which  is  independent  of  the  particular  additive  noise 
characteristics . 

The  distortion  on  the  estimates  of  the  model  coefficients 


is  therefore  the  difference  of  two  vector; 
£c  =  R'^  tr^  -  R^R'^r^]  =  [I  -  R'^R^]  R'^Iv 


{3.73} 


and 


-1 


im  =  R   Rvi 


where  9,  =  0^-  0_ 
—  d    —c   —  o 


{3.74} 
{3.75} 

This  shows  how  the  coefficient  distortion  of  a  linear 
recursive  model  depends  upon  the  choice  of  the  particular 
model  terras,  time  averages  of  the  system  input  and  output, 
and  time  averages  of  the  additive  output  noise. 

As  an  illustrative  example  of  the  effect  of  noise  on  a 
linear  recursive  model,  consider  the  AR MAC  1,1)  model; 
y(n)  =  0    (0)  u(n)  +  9.  .(1)  u(n-1)  +  0(1)  y(n-1)   {3-76} 

iiU  -l-jU  U»i 

The  following  matrices  and  vectors  are  written  by  inspection; 
T 


x(n) 


X  (n) 

—  V 


[u(n)  ,  u(n-1  )  ,  y(n-1 ) ] 
[0,0,  v(n-1  )  ]. 


u( n  )    u( "2"^  ^    y ^ "2" ^  ^ 


{3.77} 
{3.73} 


u ( n^* 1 )  u( n  ) 


yCn^) 


u(n~)    u(n  -1)    y(n^-T)_j 


{3.79} 


vCn^-l ) 
V  (  n2 ) 


v(n^-1) 


{3.30} 
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uCn^) 

udi^-l  ) 


yCn^) 


•  •    u(n    ) 

.  .    u( n    - 1  ) 
3 

.  .    y(n    -1  ) 


^(n^)   _, 


1     Yf    u(n)y(n) 
N    n=n2 


1     Y,^    u(n-l)y(n) 
N    n=n2 


1  Y.  y^^-^ ^y^^^ 

N    n=n2 


13.81 i 


r 

—  V 


0  0 

0  0 


v(n    -1)    v(n    )     ...       v(n    -1) 


"■" 

°  _         n 

0 

1 

^3 
Y      v(n)v(n-l) 

n  =  n                             _ 

vCn^  +  l  ) 


v(n^) 


u(n    )  u(n^1  ) 

u( n    -1 )    u(n    ) 
7(n^-l)    yCn^) 


^3    ..,.,2    I 


u(n^)     - 
u(n    -1  ) 


y^n^-1  )_j 


13.82} 
"udi    )  u(n    -1  )       y(n    -1  ) 

u(n+l)       u(n)  y(n) 


u(n    )  u(n    -1  )       y(n    -1  ) 


Nn=n2                 {Nn=n2                                 |         Nn=n- 
^ 


n  ^  2 

1     f^    u(n-l)^ 


I 


I    Y      i^(n-1  )y(n-1 
N    n  =  n  ^ 


f + 


SYMMETRIC 


I  ^1  2 

I    1  r  Hn-ir 


3.83] 
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0 
0 


0 
0 


^v(n    -1)       v(n    )     ...    v(n    -1  )    . 


0 


0         vCn^-O' 
0         v(n2) 


0         v(n^-l) 


-"  0  I  0         I  0  — 

0  I  0         I  0 

]l     f^    v(n-1  )'^ 
I N    n=n2  _ 


0 


I 


0 


h.84! 


^vi 


I         0      I 


0 


0 


0 


'  I  .2,3  2 

0         I        0     ,1    f    ^^^-^  ^ 

-.  '  '  N    n=  n 

2  — 


^;o(°^' 


«l:o('^ 


»oa('' 


n 


_  N    n=n_  


l3.85| 
Representing    matrix    R    of    Eq .     l3.83|     in    the    following    shorthand; 


a      '    b      •    c 

1 f.- 

R       =       b      I    d      I    e 

c      I    e     I    f 

the  inverse  of  matrix  R  can^be  written  as  shown  below. 


13.86] 


-1 


h   I  m    q 
k   '  q  I  s 


where  |r|  =  adf  +  2bec  -  c  d  -  b  f  -  e^a 


"(df  -  e-)/|R|  ](ce  -  bf)/!R|   '  ^be  -  cd)/|R 
— — — .-|.__ j-  —  —  — — -—  - 

^ce  -  bf)/|R|  I (af  -  c^)/iR|   I  (be  -  ae)/|R 

1 1 ^-- 

(be  -  cd)/|R|  l(bc  -  ae)/|R|   I  ( ad  -  b  )/|R 
I  I 

[3.87 
|3.88| 
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Substituting    {3-85}    and    {3.87}    into    {3.74}    yields; 


£m 


"3  2 

Qa.-,  (1)        T      y       v(n-1) 


{3.89} 


Substituting  {3-82},  {3.84},,  and  {3.85}  into  {3.73}  yields; 

k" 


Oc   =  i.  2   v(n)v(n-1  ) 
"*      N  n  =  n 


1  -  _s_  J]-^  v(n-1  ) 
M  n  =  n  2 


{3.90} 


Substituting  {3.89}  and  {3.90}  into  {3-75}  provides  an 
expression  for  the  distortion  of  the  coefficient  vector; 


ad=' 


n 


2-  Y.      v(n)  v(n-1  ) 
N  n  =  n^ 
2 


[n  -        2!  "  1 

1-s  y    v(n-1)   -9(1)  1  y 
Tn'tn,        J   °»^     ITn'^n 


3        21 
v(n-1) 


q 

s 

{3.91} 

where  k,  q,  and  s  are  elements  of  the  inverse  of  matrix  R. 

We  can  now  directly  examine  the  effect  of  additive  noise 
on  the  coefficient  distortion  vector,  in  terms  of  the  time 
averages  of  the  additive  noise.   If  the  additive  output 
noise  is  ergodic  and  uncorrelated  with  itself,  the  first 
terra  on  the  right  side  of  Eq .  {3-91}  is  small  and  will 
approach  zero  in  the  limit  as  N  ^OO.   The  magnitude  of  the 
coefficient  distortion  values  will  be  directly  proportional 
to  both  the  sample  autocorrelation  of  the  output  noise  and 
the  value  of  the  recursive  coefficient.   Under  this 
condition  the  coefficient  distortion  vector  equals  the 
negative  of  Eq.  {3.89}.   The  value  of  £  ,  indicated  by  Eq  . 
{3.91}  has  been  observed  in  simulation  experiments. 
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The  preceding  example  serves  to  demonstrate  the  new 
insight  that  is  available  as  a  result  of  the  development  of 
equations  {3-52}  through  (3.75}.   The  effect  of  the  presence 
and  properties  of  the  additive  noise  on  the  distortion  of 
the  model  coefficients  follows  directly  from  an  examination 
of  these  equations.   The  impact  of  this  will  be  addressed 
again  in  Chapters  VI  and  VII. 

From  the  preceding  development  of  an  expression  for  the 
distortion  in  the  model  coefficients  resulting  from  additive 
output  noise,  an  expression  for  the  related  increase  in  the 
fitting  error  can  also  be  obtained.   Substitute  z(n)  for 
y(n)  in  the  development  of  Eq .  {3-20},  denote  the  minimum 
error  fitting  criterion  resulting  from  the  noisy  data  as  J, 
and  make  use  of  {3-52}  through  (3.69}. 


1         7       "T  ^ 


N  n  =  n 

T  T 

N        ~  -   -a 

T         T         T       T       T         T        T 

=  1  LI    -^    2  v^y  +  I   v^v  -  r  9  -  £  Q   -  r  ^^Q  -  r_^   Q   {3.92} 
N        N~N  d-v--d 

*2      2 

Denote  the  distortion  in  the  minimum  value  of  J   as  J,. 

d 


2*2      2 

a 

Substituting  {3-92}  into  (3.93}  and  using  {3.20}  yields; 

2  T  IT  IT 

'^d    =     i  1  L  *    ^   11  -   '"id  -  Lv  ®   "   Lv  £d 
N  N 


{3.93} 


{3.94} 


Using  the  assumption  that  the  additive  noise  {v(n)}  is 
ergodic  and  independent  of  the  system  output  {y(n)},  the 
first  term  on  the  right  side  of  {3.9^}  is  small,  and 
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approaches  zero  as  N— ^OO.   Using  this  common  assumption  and 

substituting  {3.72}  into  {3.94}  yields; 

2       T      T  -1  -1        T        T 

J^  =  2^  v%  -  r  R   {[I  -  R^R  -^jr^  -  R^Q}  -  r^Q 

-  Lv^^""^  ^^    I  -  ^v^'  ^Hv  -  ^vV  {3.95} 

T     T  -1 
From  Eq.  {3.19}  we  have  £   =  rj- R       .       Substituting  this  into 

Eq.  {3.95}  and  simplifying  gives; 

J^  -    11^1   -   £^t  r^  -  R^R'^r^  -  R^Q  *  r^J  -  r/fi'^r^ 
T  -1    -1         T  -1 

=  J_  v^v  +  O^RyQ  -  [  r^^R*   +  20'^  ]  [  r   -  R^r"  r   ]   {3-96} 

Equation  {3.96}  is  a  new  expression  for  the  distortion 
in  the  fitting  error  criterion  of  a  linear  recursive  model 
caused  by  additive  output  noise.   Mote  that  if  the  model  had 
been  non recur  si ve ,  vector  £   would  reduce  to  the  null 
vector,  and  matrix  R    would  reduce  to  the  null  matrix.   In 

V 

this  special  case,  Eq  .  {3.96}  reduces  to; 

J=J_v_^v   =   ly^v(n) 

M    ~      N  n=n2  {3.97} 

and  the  distortion  in  the  fitting  error  criterion  would  be 

equal  to  the  average  power  of  the  additive  output  noise,  as 

expected . 

The  ARMA  model  of  Eq .  {3-76}  is  used  again  in  an 

illustrative  example  of  the  effect  of  noise  on  the  fitting 

error  of  a  linear  recursive  model.   Substituting  Eq.  {3.76} 

through  {3.88}  into  {3-96}  produces; 
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^         N    n=n2 


0   I    0    [  0 

— I — I 

0   I    0    [  0 

0   I   0    M     2]       v(n-1  ) 
—  N    nm.  — 


-    f    0   I  0    ;  _1_   y^    v(n)v(n-1)    1 
*■        •        •  N    rf?n^  J 


g    I  h|    k 
—  ^.-^ 

h    j  m  ]    q 

— t--r — 

k    I  q  I    s 


+    20 


/_ 


j_    y       v(n)  v(n-1  ) 

N    n  =  n  _ 


0    j  0    1                   0 

0    1  0    [                  0 

"g   1   h  1     " 
—  (--+-- 

h   '  m  }    q 

0                  ^ 
0 

!         1         r^3                     2 

0    ]  0    1  1     V"^    v(n-1  ) 

l—             '  N    ntn                   — 1 

T         f^ 
,          1 

1 

k       q  1    s 

"3 
±    Y/     v(n)v(n-1 ) 

_N    n  =  n 

> 


/ 


--  1    I'    ^<"''    *K;l<ii   I'   "<"- 


n 

N    n  =  n 


1) 


±    Y^      v(n)v( 

N    n  =  n 


n-1)  I         s    +    2    ©n.i^  T  )  T     -    s     y^    v(n-1)^ 

J    L  '''    JL      ^-s         J 


{3.98} 


If  the  noise  {v(n)}  is  ergodic  and  uncorrelated  with 
itself,  the  factor  prem'' 1 1  i  pi  ying  the  third  term  on  the 
right  side  of  Eq.  {3.98}  is  small,  and  approaches  zero  in 
the  limit  as  M   ^OO.   Using  this  common  assumption,  the 
fitting  error  distortion  reduces  to  the  following; 


J,     =     1      ll    v(„)^     .    [%;!<"]         1     I'    ^("- 


M    n  =  n 


1) 


L  J     N    n=n^ 


for    large    N 


{3.99} 
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The  preceding  equation  shows  that  for  uncorrela ted  additive 
output  noise,  the  increase  in  the  fitting  error  criterion  is 
proportional  to  both  the  power  of  the  additive  noise,  and 
one  plus  the  square  of  the  magnitude  of  the  recursive 

coefficient.   If  the  model  form  had  more  than  one  recursive 

2 
term,  the  resulting  equation  for  J  ,  would  appear  more 

a 

complex,  but  would  follow  a  related  form  as  this  example. 
The  preceding  development  provides  new  insight  to  the 
actual  effect  of  additive  output  noise  on  the 
characterization  of  linear  recursive  systems. 
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IV.   EVALUATION  OF  MODEL  EQUATIONS 


A.   EXISTING  TECHNIQUES 

Given  a  set  of  input  and  output  measurements  and  a  model 
equation  that  is  a  function  of  these  measurements  and  linear 
in  a  set  of  coefficients,  Chapter  III  showed  how  to  obtain 
estimates  for  the  coefficient  values  and  the  error  residual. 
The  literature  reports  [Ref.  17]  that  the  ARMA  model  can 
reasonably  represent  most  linear  systems  of  interest  using 
orders  less  than  m  =  10.   A  current  problem  is  the 
efficiency  of  computation  when  larger  and  more  general  model 
forms  like  VOL  and  BVM  are  considered.   Regardless  of  the 
degree  or  memory  of  the  model,  the  calculation  of  the  model 
fit  involves  solving  the  normal  equations  (3.15}. 

This  section  discusses  the  traditional  direct  least 
squares  model  evaluation  technique.   The  next  section 
develops  a  unified  solution  technique  for  the  more  efficient 
recursive  evaluation  of  a  wide  class  of  models.   The  last 
section  compares  the  computational  features  of  these 
evaluation  techniques. 

The  traditional  modeling  technique  starts  by  selecting  a 

T 
first  model  y(n)  =  Q  j(_(n).   We  include  the  index  parameter  1 

to  identify  this  first  model,  and  write  the  prediction  form 

equation  as  follows. 


y(n.  1) 


Q( 1)  x(n  ,  1  )  +  e(n  ,  1  ) 


{4.1} 
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Using  {4.1}  in  place  of  {3-5},  and  following  the  same 
least  squares  development  as  Chapter  III,  yields  the  normal 
equations  corresponding  to  Eq .  {3.1^}.   The  index  parameter 


is  included  where  needed. 

1  [x(  1)^X(  1)l£(  1)   =  1  X(  1)^y 

This  leads  to  the  model  error  evaluation  and  coefficient 
estimation  equations  in  terras  of  this  indexed  notation; 
J^(  1)  =  1  y^y   -   rCD'^RC  if'^ri  1) 


{4.2} 


0(  1) 
where 


R(  n'^^rC  1) 


R  (  1  )  =  J_  X  (  1  )  X  (  1  ) 
N 

T 
and   r,(i)  =  j_  X(i)  y 

N 

2 

If  the  fitting  error  J  (1)  is  too  large  for  the 

application,  the  traditional  systems  identification 
technique  is  to  select  a  larger  model  that  contains  the 
terms  of  the  first  model  plus  some  additional  terms.   This 
second  prediction  form  model  is  written  as  shown  below. 


{4.3} 
{4.4} 

{4.5} 
{4.6} 


y(n,2) 


0(2)     x(n,2)    +    e(n,2) 


{4.7} 


The  technique  forms  equations  like  {4.3}  and  {4.4}  for  the 

2 
model  of  {4.7},  and  continues  until  the  fit  J  (i)  of  model 

number  i  is  within  some  acceptable  limits.   This  is  a  brute- 
force  and  inefficient  approach  since  the  evaluation  of  the 
second  (and  subsequent  models)  does  not  take  advantage  of 
the  solution  calculated  for  the  previous  raodel(s). 
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To  appreciate  the  above  point,  we  digress  momentarily  to 
demonstrate  the  computational  complexity  (as  measured  by  the 
number  of  multiplications  or  divisions)  involved  with 
calculating  the  inverse  of  the  matrix  R(i)  when  the  model 
form  is  the  BVM(d,m)  introduced  in  Chapter  II.   Equation 
{2.9}  gives  the  number  of  coefficients  in  a  BVM  as  a 
function  of  the  choice  of  the  degree  and  the  memory.   Table 
1  shows  the  number  of  coefficients,  and  therefore  the  size 
of  the  corresponding  R(i)  matrix,  for  any  BVM  of  degree  up 
to  6  and  memory  up  to  10.   In  this  chapter,  the  notation 
c(i)  is  used  for  the  size  of  the  (i)    model  regardless  of 
i  ts  form . 


d=  1 

d  =  2 

d  =  3 

d  =  4 

d  =  5 

d  =  6 

m  =  0 

1 

2 

3 

U 

5 

6 

m=  1 

3 

9 

19 

34 

55 

83 

ra  =  2 

5 

20 

55 

125 

251 

461 

m=3 

7 

35 

119 

329 

791 

1715 

m  =  4 

9 

54 

219 

714 

2001 

5004 

m  =  5 

1 1 

77 

363 

1364 

4367 

12375   • 

rar  6 

12 

104 

559 

2379 

8567 

27131 

ra  =  7 

15 

135 

815 

3875 

15503 

54263 

m  =  8 

17 

170 

1139 

5984 

26333 

100946 

m  =  9 

19 

209 

1539 

8854 

42503 

177099 

m=  10 

21 

252 

2023 

12649 

65779 

296009 

TABLE  1:  Number  of  coefficients  in  a  BVM  of  degree  d  and  memory  m 

The  computational  cost  of  inverting  a  matrix  R(i)  of 
size  c(i),  is  of  the  order  of  1/3  times  the  cube  of  G(i) 
multiplicative  operations.   Table  2  shows  the  approximate 
number  of  such  operations  required  by  the  direct  least 
squares  technique  for  the  inversion  of  the  matrix  R(i) 
corresponding  to  a  BVM  of  degree  d  and  memory  m. 
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drl 


d  =  2 


ci  =  3 


ci  =  4 


d  =  5 


d  =  6 


m  =  0 
ra=1 
m  =  2 
m  =  3 

m  =  U 

ra  =  5 
ra  =  6 
m  =  7 
ra  =  8 
ra  =  9 


1                       3  9  22 

9       243  2287  13100 

42      2667  55460  651000 

115  1.429E4  5.617E5  1.187E7 

243  5.249E4  3.501E6  1.213E8 

444  1.522E5  1.594E7  8.459E8 

733  3.750E5  5.822E7  4.488E9 


42  72 

55450  190600 

5271000  3.266E7 

1.650E8  1.681E9 

2.671E9  3.176E10 

2.776E10  6.317E1  1 

2.096E11  6.657E12 


1125  8.200E5  1.804E8  1.940E10  1.242E12  5.326E13 
1638  1.638E6  4.925E8  7.143E10  6.087E12  3.429E14 
2287   3.043E6   1.215E9   2.314E11   2.560E13   1.852E15 


mrlO   3087   5.334E6   2.760E9   6.746E11   9.487E13   8.646E15 


TABLE  2:  Number  of  multiplication  operations  required  for  the 
matrix  inversion  involved  in  the  direct  least  squares 
evaluation  of  a  BVM  of  degree  d  and  memory  m. 

It  is  clear  that  for  degrees  above  3.  the  inversion  of 
the  matrix  R(i)  required  for  this  direct  least  squares 
evaluation  of  model  i,  rapidly  becomes  prohibitively 
expensive  for  increasing  d  or  m.   For  problems  of  interest, 
however,  we  want  to  evaluate  such  higher  degree  and/or 
memory  models  of  the  BVM  form. 

It  should  be  mentioned  that  for  large  c(i),  the 
computation  of  the  elements  of  the  R(i)  matrix  requires 
approximately  c(i)N  operations.   This  can  dominate  the 
computation  time  if  N  >>  c(i),  as  is  typically  the  case  in 
the  literature.   Even  though  the  correct  model  forms  were 
used  in  the  three  examples  of  experiment  number  1,  the 
Autocorrelation,  Prewindowed ,  and  Postwindowed  methods  still 
required  a  large  N  to  obtain  a  small  fitting  error.   On  the 
other  hand,  the  Covariance  method  gave  superior  performance 
without  requiring  N  >>  c(i).   These  results  are  typical  of 
those  obtained  with  other  computer  simulated  experiments, 
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and  indicate  that  an  equivalent  performing  model  solution 
can  be  obtained  more  economically  with  the  Covariance 
method. 

B.   PRESENTATION  OF  A  RECURSIVE  EVALUATION  TECHNIQUE 

To  develop  efficient  algorithms  for  evaluating  models  of 
the  BVM  form  of  equation  l2.7|  ,  w_e  make  _a  change  in  notation 
that  will  allow  us  to  relate  the  equations  and  solutions  of 
various  models.   This  notational  change  is  important  for 
subsequent  developments.   We  reorder  x^(n,i)  and  9_(  1  )  , 
respectively,  in  a  manner  described  below.   We  denote  the 
reordered  _x(n,i)  as  _w(n,i),  and  the  reordered  9_(i)  as  _£(i), 
such  that  Eq .  {4.7}  becomes; 

y(n,2)   -   2(2)^w^n'2)   +   e(n,2)  U-SJ 

w(n,2)^  =   [  w(n.l)   I   w(n,2/l)   ]  |4.9l 

w(n,l)   =   x(n,l)  {4.10} 

_w(n,2/l)  is  a  vector  formed  by  starting  with  _x(n,2), 
deleting  all  of  the  terms  that  also  exist  in  _w(n,l), 
and  reducing  the  size  of  the  resultant  vector  by 
eliminating  the  spaces  of  any  deleted  elements, 
and  where  d(2)^  =   [  ^(1/2)^   1   p(2/l)^  ]  U-H) 

and       _£(  1  /2  )   =   2.^^^  evaluated  at  the  2   iteration   |4.12| 
and       _£(2/l)  is  a  vector  formed  by  starting  with  9^(2), 

deleting  all  of  the  terms  that  also  exist  in  _£(1), 
and  reducing  the  size  of  the  resultant  vector  by 
eliminating  the  spaces  of  any  deleted  elements. 


where 

and 
and 
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It  remains  to  show  that  the  evaluation  of  model  equation 
{4.8|  can  be  accomplished  more  efficiently  than  the 
evaluation  of  the  same  model  given  instead  by  Eq .  |4.7l. 
Before  demonstrating  this  result,  the  preceding  notation  is 
generalized  for  models  beyond  the  first  and  second. 

We  recursively  define  an  equation  for  model  i  of  size 
c(i)  in  terms  of  model  i-1  of  size  c(i-1  ),  where 

c(i)  >  c(i-1 ) . 

T 

y(n,i)   =   2^^^  w(n,i)   +   e(n,i)  {4.13] 

where  _w(n,i)  and  v( i)     are  size  c(i)  vectors  defined  in  the 
same  manner  as  Eq .  (4.8}  through  |4.12|,  such  that. 


and 


1  L  L 

w(n,i)   =   [  w(n,i-l)   |   w(n,i/i-l)   ] 


2(i)^  =   [  n(i-1/i)^  !   2(i/i-0^  ] 


U.ul 


|4.15l 

Following  the  standard  least  squares  development  yields 
the  normal  equations  corresponding  to  equation  |4.13l: 
l[w(i)'^W(i)]^(i)   =  1  W(i)^^  |4.161 


N 


N 


where  we  have  the  c(i)  x  N  transposed  data  matrix; 
T 


W(i)'  =  [    w(n^,^)         w{n^*],i)       ...   wCn^.i)  ]  U-IYJ 

The  solution  of  |4.16|  is  the  coefficient  estimation  equation; 

U.isj 


2(i)   =   [  w(i)^v(i)  ]'^w(i)^2 


The  solution  for  the  model  fitting  error  criterion  is; 

J^(i)  =  1  2^2  -  d(i)^D(ir^i(i) 
N 

where  the  c(i)  x  c(i)  least  squares  matrix  is; 

D(i)   =  1  W(i)'^W(i) 
N 


4.  19l 


4.20l 
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,nd    the    c(i)     x    1     vector    d(i)     is    defined; 


d(i)      =  1  w(i)'2 
N 


U.21I 


Instead  of  solving  Eq .  [4. 18}  and  {4.19],  we  use  [4.14} 
and  {4. 15}  to  develop  a  set  of  recursive  model  evaluation 
and  model  coefficient  estimation  equations  .   Define  q(i)  to 
represent  the  number  of  terms  in  the  (i)th  model  that  are 


not  contained  in  the  (i-l)st  model. 
q(i)  =  c(i)  -  c(i-1 ) 


{4. 22} 
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Substitute  {4. 14}  into  |4. 

T 


W(i) 


wCn^, i-1 ) 
vr(n2+1  ,i-1  ) 


wdi^,  i-1  ) 


TJ,    and    define    the    following 

T 
w(n    ,i/i-1  ) 

w(n2+1  ,i/i-1  ) 


w(n^,i/i-1 ) 


[    W(i-1) 


W(i/i-l)     ] 


U.23I 


th 


rhere    W(i)     is    the    N    x    c(i)     data    matrix    for    the    (i)  model, 


St 


¥(i-l)  is  the  N  x  c(i-l)  data  matrix  for  the  (i-l)    model, 

and  V(i/i-l)  is  the  N  x  q(i)  data  matrix  for  the  new  terms 

/  . th  ,    .st 

in  the  (i)    model  that  are  not  in  the  (i-1 ;    model. 

Substituting  Sq .  |4.23l  into  Eq .  |4.16|  and  simplifying; 


A(i-1) 


B(i/i-1  )• 


B(i/i-l) 


A(i/i-1  ) 


£(i) 


h(i-l) 


h(i/i-l) 


U.241 


where  the  following  definitions  are  made  for  convenience. 

T 
A(i-1)  =  j_  W(i-1)  W(i-1)  ,  a  c(i-l)  x  c(i-l)  matrix      U.251 

N 

B(i/i-l)  =  _1_  W(i-1)  W(i/i-l)  ,  a  c(i-l)  x  q(i)  matrix    14.26} 
N 

T 
A(i/i-l)  »  _1_  W(i/i-l)  W(i/i-l)  ,  a  q(i)  x  q(i)  matrix    U.27} 

N 

T 
h(i-l)  =  }_  '^l(i-^  )    2  .  a  c(i-l)  column  vector  U.28} 

N 

T 
h(i/i-l)  =  _1_  W(i/i-l)  2.  »  a  a(i)  column  vector  {4.29} 

N 

The  set  of  linear  equations  {4.24}  is  a  special  permuted 

.  .  th 
form  o  f  the  normal  equations  f  o  r  the  1.  i )  .  model .   This 

special  form  results  from  the  ordering  or  _w(n,i)  described 

by  Eq .  |4.14},  and  leads  to  an  efficient  set  of  recursive 
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solution  equations  for  J  (i)  and  _£(i).   It  also  provides  the 
"basis  for  our  unified  approach  to  the  model  determination 
and  growth  problem  examined  in  the  next  chapters. 

Recursive  Model  Growth  So lut ion  and  Evaluation  Equations 

The  set  of  simultaneous  linear  equations  |4.24l  has  a 

2 

compact  solution  for  _£(i)  and  J  (i),  based  on  the  previously 

2 

obtained  _£(i-1  )  and  J  (i-1  ).   Appendix  A  contains  this 

development  and  we  only  state  and  use  the  results  here. 
It  is  convenient  to  use  the  following  definitions; 


F(i)  =  -A(i-1)   B(i/i-l)  ,  a  c(i-l)  x  q(i)  matrix 

T       -1 
G(i)  =  A(i/i-l)  -  B(i/i-l)  A(i-1)   B(i/i-l) 


14.30] 


=  A(i/i-l)  +  B(i/i-l)  F(i)  ,  a  q(i)  x  q(i)  matrix   U.jlj 


_£(i)  =  h(i/i-l)  -  B(i/i-l)  n(i-l) 

T 
=  h(i/i-l)  +  F(i)  _h(i-l)  ,  a  q(i)  column  vector 

jc(i)  =  G(i)   ^(i)  ,  a  q(i)  column  vector 

As  long  as  |  A(i/i-1  )  |  ?^  0 

then  the  solution  of  |4.24|  is  given  by 


£(i) 


p(i-0 

0 


F(i) 

I 


k(i) 


14.32] 
{4. 33] 
14.34] 

14.35] 


where  0_  is  the  null  vector  and  I  is  the  identity  matrix. 
The  resulting  minimum  average  sum  squared  error  value  is; 


J^(i) 


J^(i-1) 


g(i)  k(i) 


U.36] 


In  addition,  the  following  recur sive  .  relationships  exist; 


A(i) 


-1 


Ad-I  )'^  *  F(i)G(irS(i)^ 


G(i)   F(i)^ 


F(i)G(i) 


-1 


G(i) 


-1 


4.37] 


h(i) 


[  h(i-i)   !  h(i/i-i)  1 


{4.38] 


100 


C.   CAPABILITIES  OF  THE  RECURSIVE  EVALUATION  TECHNIQUE 
We  now  demonstrate  some  of  the  advantages  of  using 
equation  {4.35!  as  an  alternative  solution  to  |4.18|  for 
model  i.   We  showed  that  evaluation  of  equation  [4. 18}  for  a 
BVM(d,m)  requires  the  inversion  of  a  matrix  of  size  c(d,m) 
given  by  equation  |2.9l.   Examination  of  |4.30|  through 
|4.38J  reveals  that  only  one  smaller  inversion  of  size  q(i) 
need  be  performed  to  evaluate  Eq .  |4.35l  and  Eq .  |4.36|. 
This  is  the  inversion  of  G(i)  required  for  the  calculation 
of  k(i)  in  Eq.  {4.33}  • 

For  a  BVM ,  the  size  of  matrix  G(i)  at  the  (i)th  iteraion 
is  given  by  the  following  equation. 

size  of  G(i)  «  q(i)   =   c(di,m^)  -  c(d^.j_  ,  m  ^.^  )      U.391 
where  d.,  m^,  "^ --i  »  ^^^  ™  '-i  ^^®  ^^®  ^^^  degree  and  memory 
at  iterations  i  and  i-1,  respectively.   The  computational 
cost  of  recursively  evaluating  models  using  Eq .  14.35;  and 
Eq.  }4.36J  is  a  function  of  the  degree  and  memory  of  the 
various  models  1,2,...,i-1,i. 

Table  3  represents  an  example  where  we  consider  three 
different  paths  from  the  BVM  with  d=1  and  m=1,  to  the  BVM 
with  d=4  and  m=4.   The  paths  are  denoted  with  arrows  and 
oversized  letters.   The  order  of  complexity  involved  in  this 
example  is  shown  in  Table  4.   A  direct  evaluation  of  the  BVM 
with  d=4  and  m=4  is  given  for  comparison  purposes. 
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TABLE  3:   Flow  of  four  growth  paths  through  the  chart  of  the 

number  of  coefficients  in  a  BVM  of  degree  d  and  memory  m 


Model 

Size  of  Matrix 

Inversion 

Path 

i 

d4  ,  fflj 

c(d^ .m^) 

to  be  inverted 

Operations 

Total 

q(i) 

[q(i)]^/3 

A 

1 

(1,1) 

3 

3 

9 

2 

(2.2) 

20 

1  7 

1  638 

3 

(3.3) 

1  19 

99 

3.234E+5 

4 

(4,4) 

71  4 

595 

7.021E+7 

7.054E+7 

B 

1 

(1,1) 

3 

3 

9 

2 

(1,2) 

5 

2 

3 

3 

(1,3) 

7 

2 

3 

4 

(1,4) 

9 

2 

3 

5 

(2.4) 

54 

45 

3.037E+4 

6 

(3,4) 

21  9 

165 

1 .497E+6 

7 

(4,4) 

71  4 

495 

4.043E+7 

4. 1 96E+7 

C 

1 

(1,1) 

3 

3 

9 

1 

2 

(2,1) 

9 

6 

72 

3 

(3,1) 

1  9 

10 

334 

4 

(4,1) 

34 

15 

1  125 

5 

(4,2) 

125 

91 

2. 510E+5 

6 

(4,3) 

329 

204 

2.830E+5 

7 

(4,4) 

714 

385 

1 . 900E+7 

2.  208E+7 

D 

1 

(4,4) 

714 

714 

1 . 21 3E+8 

1 .21 3E+8 

TABLE  4:   Order  of  Complexity  for  four  growth  paths 
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Paths  A,  B,  and  C  each  result  in  a  lower  total 
computational  complexity  than  the  direct  evaluation  of  the 
BVM(4,4)  model  described  by  path  D.   Other  paths  are 
possible  but  this  example  is  representative  of  the 
computational  savings  that  result  from  the  use  of  the 
recursive  algorithm. 

The  development  of  the  recursive  algorithm  is  based  on 
three  assumptions. 

(1)  All  model  equations  are  linear  in  their  respective 
coefficient  vectors. 

(2)  The  equation  of  the  (i)    model  includes  all  of 
the  terms  contained  in  the  (i-l )    model,  plus  some  new 
terms.   This  is  described  mathematically  in  equations 
U.gj  ,  {4.  10j  ,  and  U.  14]  . 

(3)  The  determinant  of  A(i/i-l)  is  not  zero. 

We  explicitly  avoided  any  assumptions  on  the 

,      .  .    .St  . 

relationship  between  ^^{n,!-^),     the  terms  in  the  (i-1) 

model,  and  _w(n,i/i-1 ),  the  new  terms  appearing  in  the  (i) 

model.   This  results  in  a  general  recursive  solution 

algorithm  that  is  applicable  for  any  type  of  model  growth  we 

care  to  consider  .   The  following  chapter  shows  that  the 


7   We  still  use  the  limitation  on  the  form  of  each  term 
that  we  defined  in  Eq .  13.6}  for  continuity  of  presentation, 
but  mention  here  that  other  functional  forms  besides  integer 
products  of  observations  could  be  used  as  long  as  the 
resulting  model  equation  is  still  linear  in  the  unknown 
coe  f f ic  ients . 
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existing  "recursive-in-order"  algorithms  (e.g.  Levinson's 
[Ref.  2-4  and  20  -  25]  and  Lattice  [Ref.  5-8  and  39] 
are  special  cases  of  the  recursive  evaluation  equations 
presented  here . 

Chapter  V  develops  several  new  techniques  for  specifying 
possible  model  terra  vectors  w(n,i/i-1);  for  recursive  growth 
using  the  BVM.   These  are  less  restrictive  than  existing 
techniques,  and  allow  for  more  accurate  and  compact  modeling 
of  typical  systems  of  interest. 
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V.   TECHNIQUES  FOR  GROWING  MODELS 

A.   OVERVIEW  OF  MODEL  GROWTH 

The  objective  of  our  analysis  is  to  determine  patterns 
or  other  key  behavior  properties  from  the  measured  data,  and 
use  this  information  to  efficiently  formulate  a  suitable 
mathematical  model.   This  model  relationship  is  evaluated 
against  both  its  ability  to  predict  behavior  of  the  output 
time  series  within  some  reasonable  and  statistically 
quantifyable  degree  of  accuracy,  and  its  compactness  of  form 

Earlier  work  in  model  development  was  generally  limited 
to  an  assumed  linear  relationship,  and  started  with 
techniques  like  harmonic  analysis  and  mathematical  transform 
theory  [Ref,  32,  33  and  3^].   In  the  late  1960's,  time 
series  statistical  analysis  methods  were  developed  by  Box 
and  Jenkins  [Ref.  1?].   These  methods  are  related  to 
transformations  of  the  spectral  methods,  and  approach  the 
characterization  problem  from  the  different  perspective  of 
prediction  form  models  [Ref.  351.   These  techniques  are  not 
closed  form  solutions  to  the  system  characterization 
problem,  but  are  instead  multistage  approaches  that  have 
been  widely  used  for  the  time  series  analysis  of  real  world 
systems  [ Ref  .  36 ] . 

The  Box  and  Jenkins  technique  assumes  a  general  class  of 
time  series  models  which  has  been  found,  experimentally,  to 
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be  extremely  rich.   The  procedure  continues  as  a  trial-and- 
error  process  with  decision  points  where  the  analyst  is 
required  to  select  the  next  step  based  on  the  available 
information  [Ref.  37]. 

Since  existing  linear  time  series  techniques  are  not 
closed  form  solution  algorithms  permitting  full  analysis 
without  human  intervention  and  decisions,  it  would  not  be 
surprising  that  we  are  unable  to  find  a  complete  closed  form 
algorithm  for  the  more  general  nonlinear  relationship  case. 
But  nonlinear  systems  characterization  is  interesting  and 
important,  and  a  solution  is  still  worth  pursuing. 

Chapter  IV  introduced  a  general,  recursive  set  of 
equations  for  efficiently  evaluating  related  sets  of  model 
equations.   Efficiently  handling  the  system  characterization 
problem  requires  a  method  for  determining  what  new  model 
terms  to  add  at  each  iteration;  how  to  "grow"  the  model. 

This  chapter  discusses  existing  techniques  for  recursive 
model  growth  (e.g.  "recursive-in-order")  which  have  been 
applied  to  some  linear  and  nonlinear  systems.   Six  variants 
on  this  type  of  "block-form"  technique  are  developed  for  the 
more  general  BVM  form,  and  the  capabilities  and  limitations 
of  all  these  techniques  are  investigated. 

B.   EXISTING  TECHNIQUES  FOR  MODEL  GROWTH 

The  systems  identification  literature  [Ref.  2-8,  20  - 
25,  38  and  39]  contains  two  different  techniques  for  both 
specifying  and  recursively  estimating  model  coefficients, 
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and  no  explicit  techniques  for  just  the  recursive  evaluation 
of  model  fitting  error.   Both  techniques  are  based  on  the 
concept  of  considering  new  model  terms  that  have  a  unique 
"increasing  order"  relationship  to  existing  model  terms,  and 
take  advantage  of  the  special  Toeplitz  matrix  structure  that 
can  be  made  to  occur  in  the  resulting  equations  for  the 
coefficient  estimates  [Ref.  2-4  and  20  -  25].   This 
Toeplitz  structure  is  limited  to  a  restrictive  class  of 
models,  and  requires  the  use  of  the  Autocorrelation  error 
minimization  method.   The  iterative  solution  technique  is 
based  on  Durbin's  simpl ic i a t ion  of  Levinson's  algorithm 
[Ref.  2],  and  is  well  known  in  the  literature.   This 
technique  is  a  special  restrictive  case  of  the  general 
recursive  algorithm  presented  in  Chapter  IV,  and  the  details 
of  the  relationship  are  given  in  Appendix  B. 

Theorem  1  and  Experiment  1  show  that  the  use  of  the 
Autocorrelation  method  typically  produces  a  suboptimal  fit 
when  the  data  sequences  are  finite.   In  terms  of  nonlinear 
models,  the  requirement  for  Toeplitz  (or  even  Block- 
Toeplitz)  structure  for  the  least  squares  matrix  severely 
limits  the  choice  of  allowable  models.   The  "regular-form'' 
kernel  Nonlinear  ARM A  model  used  by  Perry  [Ref.  8]  is  a 
typical  example  of  a  restricted  choice  of  terms  in  the 
model.   This  is  discussed  again  in  the  next  section. 
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The  second  recursive  coefficient  estimation  and  growth 
technique  is  based  on  "Lattice-filtering"'  [Ref.  5-8,  38 
and  39].   This  is  a  prediction-error  version  of  Levinson's 
algorithm  using  a  lattice  structure  implementation  rather 
than  the  more  conventional  tapped  delay  line  type  of 
implementation.   The  error  residual  signal  after  each  stage 
of  the  lattice  has  converged,  is  used  in  the  computation  of 
the  lattice  parameter  estimates  and  signals  used  in 
subsequent  stages.   The  lattice  technique  is  a  special 
implementation  of  the  general  algorithm  of  Chapter  IV  which 
has  limited  applicability.   This  technique  offers  no 
advantages  for  tne  unrestricted  model  growth  we  wish  to 
consider  . 

C.   RECURSIVE  MODEL  GROWTH  WITH  THE  BVM 

Chapter  II  introduced  the  BVM  and  showed  that  it 
subsumes  the  MA,  AR ,  ARMA,  Volterra,  and  Bilinear  model 
forms.   The  recursive  model  evaluation  and  coefficient 
estimation  algorithm  presented  in  Chapter  IV  can  be  used  for 
efficient  and  meaningful  model  growth  of  any  of  these  model 
forms  . 

This  section  presents  six  extensions  of  the  recursive- 
in-order  techniques  which  apply  directly  to  the  general  BVM 
form.   In  all  cases,  the  first  model  is  evaluated  by  a 
direct  least  squares  fit  using  Eq  {4.1}  through  Eq  .  {4.6}. 
Subsequent  models  are  evaluated  using  the  recursive 
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relationships  presented  in  Eq  {4.8}  through  {4.37).   We 
restricted  the  upper  limit  of  the  lag  on  the  model  terms  to 
be  equal  to  the  memory  ra  for  both  the  input  and  output 
terms,  for  clarity  of  presentation.   This  restriction  is 
removed  in  the  model  growth  technique  presented  in  the  next 
chapter . 

The  first  growth  technique  starts  with  the  base  model, 
BVM(d,m)  =  BVM(1,1)  and  uses  the  following  fixed  procedure. 
The  fitting  error  J(d,m)  =  J(1,1)  is  evaluated  for  this 
first  model,  and  for  subsequent  models  BVM(1,1+i)  where 
i=1,2,3i...  until  J(1,1+i)  stops  decreasing  significantly 
(the  meaning  of  which  will  be  discussed  later).   This  last 
significant  model  is  denoted  as  the  new  base  model  BVM(1,m). 
The  fitting  error  J(1+j,m)  is  evaluated  for  subsequent 
models  BVM(1+j,m)  where  j=1,2,3.---  until  J(1+j,m)  stops 
decreasing  significantly.   This  last  model  is  denoted  as  the 
new  base  model  BVM(d,m)  and  the  iteration  starts  on  the 
memory  m  again.   This  two-phase  iteration  is  continued  as 
long  as  meaningful  reduction  in  fitting  error  is  obtained. 
This  search  strategy  is  denoted  as  the  "M  Directed  Growth" 
because  the  initial  phase  involves  growth  in  memory  m  (Fig. 
21  .a)  . 

A  second  growth  technique  involves  a  similar  algorithm 
with  the  difference  that  the  first  phase  iteration  is  on  the 
degree  d  (Fig.  21. b).   This  technique  is  therefore  denoted 
as  "D  Directed  Growth".   The  choice  of  an  appropriate 
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significance  test  for  switching  between  the  two-  phases  of 
these-  d_irected  growth  algorithms  remains  an  open  question. 
A  third  growth  method  is  denoted  as  "Diagonal  Growth", 
and  again  starts  with  the  evaluation  of  the  base  model 
BVM(d,m)  =  BVM(1,1).   The  fitting  error  of  successive  models 
3VM(l+i,1+i)  for  i=1,2,3,...  is  evaluated  until  J(l+i,1+i) 


is  acceptably  small  (Fig.  21. c). 
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Figure  21. c 


FIGURE    21 :       Three    Growth    Methods    for    the    BVM 
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A  fourth  technique  is  denoted  as  "M-D  Zig-Zag  Growth" 
and  starts  with  the  evaluation  of  the  base  model  BVM(1,1). 
The  memory  m  and  degree  d  are  alternately  incremented  until 
the  fitting  error  of  the  resulting  model  is  acceptable  (Fig. 
22. c)  . 

A  fifth  technique  denoted  as  "D-M  Zig-Zag  Growth" 
involves  a  similar  algorithm  with  the  key  difference  that 
the  first  phase  iteration  is  on  the  degree  d  (Fig.  22. b). 

A  sixth  growth  strategy  is  denoted  as  "Neighbor  Growth". 
It  starts  with  the  base  model  3VM(1,1)  and  evaluates  its 
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fitting  error  J(1,1).   Next  we  evaluate  models  that  differ 
from  the  base  model  by  one  increment  of  degree,  one 
increment  of  memory,  or  both.   In  this  starting  case  we 
evaluate  J(l,2),  J(2,1),  and  J(2,2).   We  denote  the  model 
with  the  lowest  fitting  error  J  as  the  new  base  model  and 
continue  this  iteration  process  until  the  decrease  in  J  is 
no  longer  significant  (Fig.  22. c). 


Figure  22. a 


Figure  22. b 


Figure  22. c 


FIGURE  22:   Three  Additional  Growth  Methods  for  the  BVM 

These  six  techniques  all  fall  under  the  general  type  of 
growth  we  refer  to  as  "block-form"  techniques.   At  each 
iteration  the  model  growth  is  accomplished  by  automatically 
including  one  of  a  predetermined  set  of  model  terms. 

One  other  model  growth  technique  has  been  proposed  by 
Perry  [Ref.  s].   A  special  and  restricted  form  of  the 
nonlinear  ARMA  model  (earlier  version  of  the  BVM)  and  the 
Autocorrelation  error  minimization  method  are  used  to 
develop  a  special  multichannel  lattice  form  parameter 
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estimation  solution.   Reference  8  does  not  contain  any- 
experimental  verification  of  this  technique.   We  analyze 
this  technique  to  show  its  intrinsic  weaknesses. 

A  quadratic  nonlinear  ASMA  model  of  this  special  form 
can  be  written  as  shown  below  (a  translation  of  Ref  5,  pp. 
193,  Eq  {4.29}  into  the  notation  of  this  thesis). 
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«2       Jl 
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h^=0    h^-0 

The  proposed  technique  of  Reference  8  requires  that  the 
user  prespecify  the  integer  values  of  the  upper  summation 
limits  M  ,  M  ,  and  M  .   Then  Eq .  |5.ll  becomes  a  recursive- 
in-order  M.  model  equation  that  can  partially  evaluated  by 
least  squares  lattice  techniques.   Some  of  the  model  terms 
do  not  fit  the  restrictions  of  the  lattice  solution  and  must 
be  evaluated  separately  by  direct  least  squares.   Despite 
the  attractiveness  of  the  potentially  efficient  lattice 
form,  the  requirement  to  prespecify  all  but  one  of  the  upper 
summation  limits  of  Eq  {5.1 1  excessively  complicates  any 
systematic  model  growth  with  this  method.   Reference  8  does 
not  provide  any  suggestions  on  how  to  select  these  upper 
limits.   Since  this  technique  automatically  involves  a  fixed 
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set  of  terms  at  each  iteration,  it  falls  under  our 
definition  as  a  block-form  technique.   It  basically  is  an 
attempt  at  fitting  a  problem  (parameter  estimation)  to  a 
particular  form  of  solution  (lattice  form  recursive-in- 
order),  and  is  inferior  to  the  growth  methods  of  this 
chapter  and  the  following. 

These  block-form  techniques  can  all  eventually  subsume 
any  system  of  the  BVM  form.   They  unfortunately  require  a 
significant  amount  of  computations  as  the  degree  and  memory 
increase  (See  section  VI. D).   The  first  five  techniques  are 
all  nonlinear  extension  of  the  linear  recursive-in-order 
concept  discussed  earlier,  and  are  brute-force  methods.   The 
Neighbor  Growth  technique  offers  an  appealing  approach  for 
potentially  more  efficient  model  growth,  but  also  suffers 
from  the  problem  of  high  computational  cost  when  used  to 
evaluate  the  addition  of  a  large  number  of  model  terms.   We 
continue  the  discussion  of  efficient  model  growth  along  a 
related  line  in  the  next  chapter. 
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VI.   SEARCH  INDICATORS  FOR  EFFICIENT  MODEL  GROWTH 

A.   DISCUSSION 

The  block-form  growth  techniques  of  Chapter  V  follow 
from  the  conventional  concept  of  model  growth  in  linear 
systems.   Closer  examination  reveals  that  these  techniques 
have  a  number  of  serious  flaws  when  used  for  nonlinear  model 
growth.   Given  a  finite  amount  of  measurement  data,  we  can 
only  consider  evaluating  models  when  the  number  of  model 
terms  is  less  than  or  equal  to  the  number  of  data  samples. 
Table  1  reveals  that  many  of  these  nonlinear  models  can  only 
be  evaluated  with  extremely  long  sequences  of  data 
measurements.   For  example,  the  model  BVM(3,5)  has  363 
different  terms.   Only  a  few  of  these  terms  may  be  needed  in 
modeling  any  third  degree  dynamic  system  whose  equation 
involves  the  factor  u(n-5)  or  y(n-5).   However,  all  of  these 
363  terms  are  involved  in  the  full  model  evaluation  when  the 
block  form  growth  techniques  are  used,  and  sufficient  data 
measurements  must  be  available.   The  following  points 
summarize  the  results  of  many  computer  simulated  model 
growth  experiments  (See  Chapter  VII  later). 

As  the  degree  or  memory  increase,  all  block-form 
modeling  techniques  automatically  consider  an  increasing 
number  of  terms  at  each  subsequent  growth  iteration.   This 
results  in  rapidly  increasing  computational  cost,  and  often 
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produces  an  ill-conditioned  least  squares  matrix  A(i)  due  to 
the  inclusion  of  several  model  terms  with  nearly  equivalent 
properties  in  terms  of  values  in  this  matrix.   By  ill- 
conditioned,  we  mean  the  condition  number  (the  ratio  of  the 
largest  to  smallest  eigenvalue)  is  numerically  large  (e.g. 
greater  than  10000). 

A  higher  condition  number  for  matrix  A('i)  is  related  to 

less  accurate  estimates  for  the  coefficients  _£(i),  and 

2 

higher  fitting  error  J  (i).   In  some  cases  the  matrix  A(i) 

becomes  so  ill-conditioned  that  it  is  no  longer  positive 
definite,  and  the  resulting  model  evaluation  is  no  longer 
optimum  in  any  least  squares  sense.   It  has  been 
experimentally  verified  that  the  general  use  of  these  block- 
form  modeling  techniques  often  produces  poor  results  for 
nonlinear  systems;  namely  offset  model  coefficient 
estimates,  high  fitting  error,  and  the  inclusion  of  terms 
that  are  not  actually  needed.   An  example  is  provided  in 
Chapter  VII. 

One  possible  approach  to  overcoming  the  preceding 
problems  is  to  start  with  some  base  model  such  as  BVM(1,1), 
and  use  one  of  the  block-form  techniques  just  to  specify  a 
new  set  of  q(i)  model  terms.   Instead  of  evaluating  these 
candidate  terms  as  a  set,  assume  that  only  a  subset  of  them 
is  actually  needed.   The  problem  is  how  to  find  the 
particular  subset  of  these  terms  required  in  the  final 
model . 
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This  approach  is  related  to  standard  stepwise  regression 
analysis  [Ref.  40],  and  also  to  a  recently  published 
technique  known  as  GMDH ,  the  Group  Method  of  Data  Handling 
[Ref.  Ul].   These  preceding  techniques  are  general  enough  to 
permit  consideration  of  a  wide  variety  of  model  terms  and 
both  can  avoid  the  ill-conditioned  solution,  but  they  still 

have  the  following  major  problem.   With  q(i)  potential  new 

q(i) 
model  terms,  there  are  2      possible  model  equations  to 

consider.   Except  for  small  values  of  q(i),  the  exhaustive 

evaluation  of  each  of  the  corresponding  solution  equations 

rapidly  becomes  prohibitively  expensive. 

There  is  the  additional  problem  of  a  stopping  criterion. 

Examination  of  Eq .  {4.33}  and  Eq  .  {4.36}  shows  that  the 

2 
fitting  error  J  (i)  is  monotonical ly  decreasing  when  new 

model  terms  are  added  and  matrix  G(i)    is  positive 

definite.   Therefore,  while  only  some  number  r  out  of  these 

particular  q(i)  candidate  model  terms  may  be  needed  in  the 

final  model,  the  fitting  error  with  r+1  terms  will  still  be 

lower  (with  the  exception  of  numerical  errors  or  an  exact 

model  f i t ) . 

In  all  practical  situations,  we  have  only  finite 

measurement  data  and  finite  computer  resources,  and  are  left 

with  an  interesting  and  not  unusual  problem.   When  the 

preceding  model  growth  techniques  yield  ill-conditioned 

solutions  or  increase  to  a  point  (1)  greater  than  the  finite 

data  can  handle,  or  (2)  beyond  the  computational  resources; 
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are  there  any  prudent  procedures  that  we  can  employ?   The 
field  of  artificial  intelligence  has  provided  some 
motivation  in  related  problems  including  chess  playing 
programs  and  voice-recognition  methods.   The  basis  for 
comparison  is  the  intractability  of  the  exhaustive  solution 
when  there  is  only  finite  data,  time,  and  computational 
power.   Since  the  performance  modeling  problem  is 
interesting,  and  has  practical  applications,  we  develop  a 
semi-heuristic  technique  to  follow  when  there  are  not  enough 
resources  for  the  exhaustive  solution. 

This  chapter  presents  the  new  concept  of  "search 
indicators"  for  efficiently  growing  a  model  of  an  unknown 
linear  or  nonlinear  system  from  a  finite  set  of  input-output 
measurements.   Rather  than  attempting  to  solve  the  typically 
large  set  of  normal  equations  for  all  of  the  new  candidate 
model  terms,  the  proposed  concept  uses  an  easily  computable 
search  indicator  (scalar  value),  or  a  set  of  such  search 
indicators,  for  each  candidate  model  term  contained  in 
w(n,i/i-1)  at  each  growth  iteration  i.   The  relative  values 
of  these  search  indicators  are  used  to  systematically 

Q 

exclude  those  terms  expected   to  have  insignificant  effect 


3   The  word  expected  is  used  to  acknowledge  the  heuristic 
nature  of  some  of  these  search  indicators  and  of  their  use. 
We  have  been  unable  to  prove  that  any  technique  based  on 
their  use  is  guaranteed  to  pick  the  optimum  model  terms  at 
each  growth  iteration.   These  indicators  are  logical  factors 
based  on  the  recursive  model  evaluation  equations,  and  the 
results  of  many  experiments  show  that  techniques  using  some 
search  indicators  provide  for  highly  efficient  model  growth. 
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on  reducing  the  fitting  error.   The  remaining  terra  or  terras 

I"  h 

are  retained  in  the  vector  w(n,i/i-1)  for  this  (i)    model, 
and  used  to  form  a  much  smaller  set  of  normal  equations  that 
are  efficiently  evaluated  by  the  general  recursive  equations 
presented  in  Chapter  IV. 

This  proposed  two-phase  concept  offers  a  number  of 
improved  capabilities  over  the  existing  growth  techniques. 

(1)  Since  we  compute  the  search  indicators  for  each 
candidate  model  terra  seperately,  we  can  consider 
more  potential  model  terms  than  the  number  of  data 
measurements.   As  a  result,  the  terms  of  nonlinear 
models  with  large  degree  and  memory  can  now  be 
considered.   We  must,  of  course,  eliminate  enough 
terms  such  that  the  reduced  model  form  evaluated  in 
the  second  phase  has  fewer  unknowns  than  the  number 
of  data  measurements, 

(2)  This  technique  allows  the  evaluation  of  widely 
different  model  terms  at  any  iteration.   Unlike  the 
recursive-in-order  (or  more  general  block-form) 
techniques,  there  is  no  longer  the  implicit 
restriction  that  the  current  model  contain  all  of 
the  possible  set  of  input,  output,  and  bivariate 
terras  specified  by  a  particular  degree  and  memory. 

(3)  The  initial  set  of  model  terms  w(n,1)  can  be 
better  chosen  by  the  search  indicator  concept, 
rather  than  by  blindly  picking  a  predefined  base 
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model  like  BVM(1,1).   The  computer  simulated 
experiments  of  Chapter  VII  show  that  this  property 
allows  for  the  efficient  characterization  of  a 
general  class  of  systems  having  an  input-output 
delay  L  (e.g.  where  terms  containing  the  factor 
uCn-k)  for  k  =  0  ,  1  , 2 ,  .  .  .  , L- 1  are  not  needed  in  the 
final  model).   In  cases  where  the  system  under 
consideration  has  a  delay  factor  L,  the  block-form 
techniques  fail  to  recognize  and  exploit  this 
property,  and  often  converge  on  a  more  complex 
model  . 

(4)  The  search  indicator  technique  selects  one  or 
more  candidate  model  terms  in  the  first  phase, 
produces  a  much  smaller  matrix  A(i/i-l),  and 
therefore  significantly  reduces  the  computational 
burden.   It  is  also  capable  of  efficiently  handling 
the  previously  discussed  problem  of  ill-condi tioninj 
caused  by  nearly  equivalent  model  terms. 
These  features  are  demonstrated  in  the  following  sections. 

The  next  section  defines  various  possible  search 
indicators  based  on  the  signals,  vectors,  and  matrices 
contained  in  the  recursive  model  growth  solution  and 
evaluation  equations  introduced  in  Chapter  IV.   Some 
physical  interpretation  is  given  for  each  of  these  search 
indicators,  and  the  set  is  reduced  to  a  smaller  set  worthy 
of  further  investigation.   Results  of  many  computer 
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simulated  experiments  have  shown  the  superior  growth 
capabilities  of  the  proposed  concept,  and  confirmed  that  the 
search  indicator  technique  provides  significant 
computational  savings  and  accuracy  improvements  over  all  of 
the  previously  discussed  growth  techniques.   Examples  of 
model  growth  are  provided  in  the  computer  simulated  and  real 
world  experiments  of  Chapter  VII. 


th 


B.   DEVELOPMENT  OF  SEARCH  INDICATORS 

The  notation  w^(n,i/i-1)  is  used  to  represent  the  (j) 

model  term  (out  of  the  candi-date  set  of  terms)  that  we 

.  ,  t  h 
consider  adding  at  the  (i)    iteration,  given  that  we  have 

s  t 
previously  evaluated  and  accepted  a  model  at  the  (i-1) 

iteration.   We  let  q(i)  still  represent  the  number  of 

candidate  model  terms  considered  at  the  (i)    iteration,  and 

therefore  j=1,2,3 q(i). 

The  following  development  is  partially  based  on  the 
notation  for  the  signals,  vectors,  and  matrices  contained 
within  the  recursive  solution  and  evaluation  equations  of 
Chapter  IV.   One  important  note  of  clarification  needs  to  be 
made  at  this  point  to  minimize  potential  confusion. 

The  set  of  equations  {4.8}  through  {4.381  in  Chapter  IV 
was  developed  to  evaluate  the  improvement  in  model  fitting 
error  and  calculate  the  new  coefficient  estimates  based  on 
adding  the  entire  candidate  set  of  terms  w(n,i/i-1)  to  the 
model  with  the  existing  set  of  terms  v^(n,i-l).  The  set  of 
search  indicators  developed  in  this  chapter  is  to  be 
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calculated  for  each  of  the  candidate  model  terms  w^(n,i/i-1) 
in  the  candidate  set  w(n,i/i-1).   These  indicators  are 
designed  to  each  give  some  partial  metric  or  measure  for  the 
improvement  in  model  fitting  error.   As  a  result  of  this 
development,  many  of  the  matrices  and  vectors  defined  in  Eq  . 
{4.8}  through  Eq .  {4.38}  for  the  evaluation  of  multiple 
model  terms  are  used  in  this  chapter  in  a  reduced  form  (e.g. 
vectors  and  scalars,  respectively)  for  the  search  indicator 
evaluation  of  each  terra.   Whenever  possible  we  use  the  lower 
case  vector  version  of  the  matrix  designation  to  represent 
the  corresponding  reduced  form  vector  (e.g.  u^^(i/ i-^)    for 
W(i/i-1)).   Likewise  we  use  the  scalar  representation  to 
describe  the  corresponding  reduced  form  of  a  vector  (e.g. 
h  .(i/i-1 )  for  h(i/i-1 ) ) . 

These  reductions  are  made  only  for  clarity  in  the 
development  of  the  search  indicators.   Once  a  subset  of 
model  terms  is  selected  by  the  search  indicators,  the  full 
form  equations  of  Chapter  IV  are  used  to  evaluate  the 
fitting  error  and  coefficient  estimates.   It  is  noted, 
however,  that  some  of  the  factors  calculated  in  the 
evaluation  of  the  search  indicators  for  the  candidate  model 
terras  can  be  used  again  in  the  actual  evaluation  of  the 
model  performance.   Thus,  some  of  these  computations  will 
serve  double  duty. 

A  primary  concern  is  efficiency  of  computation,  so  the 
numerical  complexity  (number  of  multiplications  and 
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divisions)  involved  in  calculating  each  search  indicator  has 
been  analyzed.   The  notational  convention  0(N)  will  be  used 
to  denote  N  multiplication  or  division  operations.   This 
complexity  notation  is  included  with  the  development  of  each 
search  indicator,  and  summarized  with  examples  in  Table  5 

and  Table  5  after  the  development  of  all  of  the  indicators. 

s  t 
Denote  the  size  c(i-1)  of  the  (i-1)    model  as  P,  and 

the  number  of  data  points  in  the  error  minimization  as  N. 

s  t 
Since  we  have  completed  the  the  evaluation  of  the  (i-1) 

model,  the  following  matrices  and  vectors  are  available. 

W(i-1)  =  a  N  X  P  matrix  given  by  Eq  .  {4.23} 

A(i-1)  =  a  P  X  P  matrix  given  by  Eq  .  {4,25} 

2i(i-1)  =  a  P  X  1  column  vector  given  by  Eq  .  {4.28} 
We  also  have  A(i-l)    and  £(i-l),  the  coefficient  vector. 

Some  preliminary  vectors  needed  for  the  development  of 

the  search  indicators  are  presented  at  this  point. 

T 
w.(i/i-1)   =  [Wj (n^.i/i-l ) .Wj(n  +1  ,i/i-1  ) w^Cn  ,1/1-1)]  {6.1} 

=  a  N  X  1  transposed  vector  of  the  signal  specified 

by  the  (j)    candidate  model  term  over  the 

interval  (n  ,n  ).   This  is  the  reduced  version  of 

the  data  matrix  W(i/i-1)  given  by  Eq.  {4.23}.  in 

the  case  of  just  the  (j)    candidate  model  term. 


e(n ,i-1  ) 


y(n)  -  w(n,i-1)  £(i-1)    for  n  <=n<=n 


{6.2} 


value  of  the  error  residual  at  discrete  time  n 

s  c 
from  the  (i-1)    model  iteration.   This  can  be 

computed  with  P  multiplications  per  point. 
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e(i-1)   =  [  e(n  ,i-1),  e(n  +l,i-1) e(n-,i-1)  ] 


{6.3} 


=  a  M  X  1  column  vector  of  the  residual  sequence 

s  t 
values  from  the  (i-1)    iteration  over  the 

interval  (n  ,n»).   This  requires  PN  multiplications 

{6.4} 


_b.(i/i-n  =  1  W(i-1)  w.(i/i-1) 


the  reduced  version  of  matrix  B(i/i-1)  given  by 
Eq.  {U.26}  in  the  case  of  just  the  (j)    candidate 
model  term.   Since  this  is  a  P  x  N  matrix  times  a 
N  X  1  column  vector,  the  cost  is  0(PN+1). 


f,(i/i-1)  =   -A(i-1  )''^b  .(i/i-1  ) 


-J 


{6.5} 


-J 
=  the  reduced  version  of  matrix  F(i)  given  by 

th 
Eq.  {U.30}  in  the  case  of  just  the  (j)    candidate 

model  term.   Since  A(i-1)    is  a  P  x  P  matrix  that 

we  already  computed,  and  b-(i/i-1)  is  a  P  x  1 

column  vector  obtained  in  Eq.  {6.4}  at  a  cost  of 

0(PN+1),  the  total  cost  of  computing  f^(i/i-l)  is 


-J 


O(P^)  +  0(PN+1)  =  0(P^+PN+1). 


Twelve  different  search  indicators  were  developed  and 
examined  in  this  work.   The  initial  set  of  search  indicators 
I(j,1)  through  I(j,8)  was  developed  from  an  algebraic 
perspective;  i.e.  these  relationships  arose  from  an 
examination  of  the  general  recursive  evaluation  equations  of 
Chapter  IV,  under  the  condition  of  adding  a  single  new  model 
terra  w^(n,i/i-1).   Each  search  indicator  therefore  has  a 
direct  relationship  to  the  actual  system  characterization 
experiment . 
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The  first  search  indicator  is  the  time  average  of  the 
product  of  the  signal  specified  by  the  (j)    candidate  model 
term,  and  the  output  signal  of  the  system. 


T        ^3 
I(J,1)  =  1  Wi^i/i-D  Z  =  1  r   w.(n,i/i-1  )y(n) 
N  N  n=n 


[6.6] 


Since  jf_  .[  i/  i-^   )  is  a  N  x  1  vector  and  ^  is  a  N  x  1  vector, 
the  calculation  of  l(j,l)  requires  0(N+1)  operations  for 
each  candidate  model  term. 

This  indicator  corresponds  to  the  scalar  version 
h.(i/i-l)  of  the  vec tor  h( i/ i-1 )  defined  by  Eq .  {4.29|. 
While  intuitively  appealing  as  the  "empirical"  cross- 
correlation  between  the  output  of  the  system  under  test  and 
the  signal  specified  by  the  candidate  model  term,  this 
indicator  has  a  basic  flaw.   It  is  a  function  only  of  the 
output  of  the  system  and  the  candidate  model  term,  and  as 
such,  does  not  depend  on  the  particular  terms  in  the 
previous  model.   Numerous  computer  simulated  experiments 
have  verified  that  l(j,l),  taken  alone,  is  unsuitable  as  a 
reliable  search  indicator  for  model  growth. 

The  second  search  indicator  is  the  value  corresponding 

to  the  reduced  version  g.(i)  of  the  vector  _g(i)  of 

Eq .  I4.32I,  in  the  case  of  just  the  (j)    candidate  model 

t  erm . 

I(j,2)  =  1  w.Ci/i-l  )'^Z  "■  lid/i-O  Jl(i-1) 
N  -^ 

=  l(j,0  +  f  j(i/i-1  )'^h(i-1  )  |6.7| 

where  l±  i(  ^/  ^-^  )     is  a  N  x  1  vector  and  ^  is  a  N  x  1  vector. 
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Since  _fj(i/i-l)  is  a  P  x  1  vector  obtained  at  the  cost 

0(P  +PN+1),  and  h(i-l)  is  a  P  x  1  vector,  the  calculation  of 

2 
l(j,2)  requires  a  total  of  0(P  +PN+P+N+2)  operations  for 

each  candidate  model  term. 

We  digress  momentarily  to  examine  some  of  the 
characteristics  of  the  full  vector  _g(i)  given  by  Eq .  {4.32|. 

Substituting  14.26}  and  (4.291  into  {4.52}  produces; 

^(i)       «   1  W(i/i-1  )  2  -  1  W(i/i-1  )'^W(i-1  )_£(i-1  ) 
N  N 

=   1  W(i/i-l)   [  2  -  W(i-l)o(i-l)  ] 
N 


1  W(i/i-1 )  e(i-1 ) 


1  e(i-1 )  W(i/i-1 ) 


[6.8] 


Since  {e(n,i-l)!  is  the  prediction  error  sequence  of  the 


St 

(i-l)    model,  and  the  vector  _e(i-l)  contains  the  values  of 
}e(n,i-l)},  we  see  that  ^(i)  is  a  vector  whose  (j)    element 
is  the  normalized  inner  product  of  _e(i-1 )  and  the  v j J 

column  of  W(i/i-l).   Examination  of  Eq .  |4.23l  and  Eq .  [6.l| 

,  ,  c  h  /  /    \ 

reveals  that  the  Cj)    column  of  W(.i/i-1)  is  the  vector 

jf-(i/i-l),  and  yields  the  following  expression, 
g  (i)  -  j_  e(i-1  )  w  (i/i-1  ) 
g  (i)  =  1  e(i-1  )  w-d/i-l  ) 


...  (i)  =  1  e(i-1  )  w  ..^  (i/i-1  ) 

q(i)     J  -     -<i(i) 


|6.9! 
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where  _£(i)   =   [  g  (i),  g^  (  i)  »  •  •  •  »  g  j  (  i)  .  •  •  •  .  g  . . .  (  i)  lU.IOJ 
and  where  q(i)  =  c(i)  -  c(i-l)  |6.1l| 

Examination  of  Eq .  |6.9l  shows  that  the  value  g^(i)  is 
the  time  average  of  the  product  of  the  error  residual  signal 
and  a  signal  formed  by  products  and  powers  of  products,  of 

the  input-output  measurements  corresponding  to  the 

th 
specification  of  the  (j;    candidate  model  term.   This  gives 

physical  interpretation  and  increased  meaning  to  the  value 

g.(i),  which  is  contained  in  Eq .  |6.7l  (search  indicator 

two),  and  equivalently  in  Eq .  {6. 12;  below  as  search 

indicator  three. 

I(j,3)  =  1  wj(i/i-l)  e(i-l)  =  1  J]"^  Wj(n,i/i-1  )e(n,i-1  )  =  g.(i) 
N  N  n=n^ 

^  16.12} 

Since  v^Al/i-^)     is  a  N  i  1  vector  and  _e(i-l)  is  a  N  x  1 

vector  obtained  at  the  cost  0(PN),  the  calculation  of 

l(j,3)  requires  0(PN+N+l)  operations  for  the  first  candidate 

model  term.   For  the  second  and  subsequent  candidate  model 

terms  the  cost  is  reduced  to  0(N+l)  since  _e(i-l)  has  already 

been  calculated.   Tables  5  and  5  show  that  l(j,3)  can  be 

computed  much  more  efficiently  than  l(j,2).   It  is  also 

intuitively  appealing  to  be  using  the  error  residual  from 

the  previous  model  in  evaluating  the  usefulness  of  a  new 

candidate  model  term. 

,    .  sc 

If  the  (i-1  )    model  produces  an  exact  match  to  the 

measured  input-output  data  |u(n)|  and  |y(n)|,  then  g^(i)  is 
zero.   This  occurs  regardless  of  the  choice  of  the  new  term 
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w^(n,i/i-l)  considered  for  inclusion  in  the  (i)    model.   It 
follows  that  the  absolute  value  of  g.(k)  should  be  a  useful 
measure  at  any  step  k  <  i  in  the  growth  iteration.   It  is 
conjectured  that  chis  represents  a  measure  of  the  relative 
benefit  of  that  particular  model  term  as  compared  with  other 
possible  choices  of  terms.   In  this  regard  the  term  that 

produces  the  largest  absolute  value  of  g.(k)  would  also 

2 
probably  result  in  the  smallest  value  of  J  (i),  the  error 

fitting  criteria.   This  last  point  remains  to  be 

demons  tra  ted . 

The  above  discussion  indicates  that  l(j,3)  should  be  a 
potentially  good  search  indicator,  either  alone,  or  in 
combination  with  other  factors.   We  will  later  consider 
other  search  indicators  based  on  g.(i). 

The  fourth  search  indicator  l(j,4)  is  the  time  average 
of  the  square  of  the  signal  specified  by  the  (j;    candidate 
model  term. 

I(j,4)  '  1  Wj(i/i-1  )^Wj(i/i-l)  =1  2;^rwj(n,i/i-l)]     {6.131 

N  N  n  =  n  i" 

Since  M^-[i/i-^)    is  a  N  x  1  vector,  the  calculation  of  l(j,4) 
requires  0(N+1)  operations  for  each  candidate  model  term. 
This  corresponds  to  the  reduced  version  of  matrix  A(i/i-l) 
given  by  Eq .  {4.25|.   It  can  be  efficiently  computed,  but 
suffers  the  same  flaws  as  l(j,l). 

The  fifth  search  indicator  is  the  scalar  corresponding 
to  the  reduced  version  of  matrix  3(i)  given  by  Eq .  {4.32]  in 
the  case  of  one  additional  coefficient  in  the  (i)    model. 
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I(j,5)  =  J.  w  (i/i-1)  w.(i/i-1)  +  b.(i/i-1)  f  .(i/i-1) 


=  I( j  .4)  +  b  .(i/i-1)  f  .(i/i-1) 
-J        -J 


{6.14} 


Since  b_. (i/i-1)  is  a  P  x  1  vector,  and  f  .(i/i-1)  is  a  P  x  1 

2 
vector,  the  cost  of  computing  f  .(i/i-1)  is  0(P  +PN+1)  and 

includes  the  cost  of  computing  b_^(i/i-1j).   Therefore  the 

2 
calculation  of  I(j,5)  requires  a  total  of  0(P  +PN+P+N+2) 

operations  for  each  candidate  model  term.   Examination  of 

Eq.  {4.33  and  {4.36}  reveals  that  the  scalar  value  I(j,5)  is 

inversely  related  to  the  reduction  in  the  fitting  error  that 

results  if  the  single  candidate  model  term  is  brought  into 

the  model.   As  such,  there  is  reason  to  expect  that  I(j,5) 

would  be  a  good  search  indicator,  either  alone  or  in 

combination  with  other  factors.   Unfortunately,  the  high 

cost  for  I(j,5)  precludes  its  general  use. 

*he  sixth  search  indicator  is  the  scalar  value 

corresponding  to  the  reduced  version  of  vector  k(i)  from  Eq. 

{4.33}  in  the  case  of  one  additional  model  term. 


I(J.6) 


K j.3)/I( J  .5) 


2  w  .  ( i/i-1 )  e( i-1  ) 
N  ~ 


_l_w. (i/i-1)  w. (i/i-1)  +  b  .(i/i-1  )^f  .(i/i-1  )   {6.15} 
N   ^  "-*         "■' 

where  w. (i/i-1)  is  a  N  x  1  vector,  e_(i-1)  is  a  N  x  1  vector 

obtained  at  the  cost  0(PN),  and  f.(i/i-1)  is  a  P  x  1  vector 

2 
obtained  at  the  cost  0(P  +PN+1)  which  includes  the  cost  of 

computing  the  P  x  1  vector  b.(i/i-1).   Therefore,  the 
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calculation  of  I(j,6)  requires  0(P  +2PN+2N+P+4)  operations 
for  the  first  candidate  model  term.   For  the  second  and 

subsequent  candidate  model  terms  the  cost  is  reduced  to 

2 
Q(P  +PN+2N+P+4)  since  £(i-1)  has  already  been  calculated. 

Examination  of  Eq  .  {4.36}  reveals  that  the  value  of 

I(j,6)  is  directly  related  to  the  reduction  in  the  fitting 

2 
error  J  (i)  that  results  from  including  the  candidate  term 

in  the  model.   Tables  5  and  6  indicate,  however,  that  there 

is  a  very  high  computational  cost  associated  with  this 

search  indicator. 

The  seventh  search  indicator  is  the  value  of  the  change 

2 
in  the  error  criterion  J  (i)  as  a  result  of  including  the 

candidate  model  term.   It  is  based  on  Eq.  {4.-32},  {4.33}f 

and  {4.36}. 

K j.7)  =  I( j,2)/I( j  .5) 

T  T 

±      Wj(i/i-1)  y  +  f.(i/i-1)  h(i-1) 

N   " 

±      w.(i/i-1)  w.(i/i-1)  +  b.(i/i-1)  f.(i/i-1)   {6.16} 
N   ~ 

where  w.(i/i-1)  is  a  N  x  1  vector,  y  is  a  N  x  1  vector, 

h(i-l)  is  a  P  X  1  vector,  and  £.(i/i-1)  is  a  P  x  1  vector 

2 
obtained  at  the  cost  0(P  +PN+1)  which  includes  the  cost  of 

computing  the  ?  x  1  vector  b_.(i/i-1).   Therefore  the  total 

2 
cost  of  computing  I(j,7)  requires  0(P  +PN-t-2P  +  2N  +  5 ) 

operations  for  each  candidate  model  term. 

This  is  the  exact  value  of  the  reduction  in  the  error 

criterion  resulting  from  including  the  candidate  model  terra. 


129 


I 


As  such,  it  probably  should  not  be  called  an  "indicator". 
It  is  included  as  a  control  indicator  since  it  has  the 
desired  property  of  exactly  describing  the  performance 
improvement.   Tables  5  and  6  show  that  this  indicator  is 
extremely  expensive  to  compute.   We  next  reduce  the 
computational  complexity  using  l(j,3)« 

The  eighth  search  indicator  is  the  value  of  the  change 

2 

in  the  error  criterion  J-  (i),  as  a  result  of  including  the 

candidate  model  term  and  using  the  error  residual  signal  of 
the  model  from  the  previous  growth  iteration.   It  is  based 
on  Eq .  {6.8},  (4.331,  and  {4.36].   l(j,8)  has  the  following 

form; 

2 

I(j,8)  =  I( j,3)/l( j,5) 

T        l' 

j_   w.(i/i-1  )  e(i-l) 

N    -■  "J 

1   w.(i/i-1 )^w.(i/i-1 )  +  b.(i/i-l)  f.(i/i-l)   {6.17} 

where  _Wi(i/i~''  )  is  a  N  x  1  vector,  _e(n-1  )  is  a  N  x  1  vector 
obtained  at  the  cost  0(PN),  and  J^^(i/i-l)  is  a  P  x  1  vector 
obtained  at  the  cost  0(P  +PN  +  1  )  which  includes  the  cost  of 

computing  the  P  x  1  vector  _b^(i/i-l).   Therefore  the 

2 

calculation  of  l(j,8)  requires  0(P  +2PN+2N+P+5)  operations 

for  the  first  candidate  model  term.   For  the  second  and 

subsequent  candidate  model  terms  the  cost  is  reduced  to 

2 

0(P  +PN+2N+P+5)  since  _e(i-l)  has  already  been  calculated. 

This  is  the  exact  value  of  the  reduction  in  the  error 
criterion  resulting  from  including  the  candidate  term,  using 
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the  alternate  and  less  costly  computation  for  g.(i) 
discussed  previously.   Unfortunately  the  cost  of  computing 
the  denominator  of  Eq  .  {6.17}  predominates,  and  we  have  an 
alternative,  but  still  costly,  directly  related  search 
indicator  . 

The  next  three  search  indicators  were  developed  in  an 
attempt  to  recognize  some  additional  factors  that  could  be 
used  to  reduce  the  computational  burden  of  the  original  set. 
Their  physical  interpretations  are  not  as  clear,  but  they 
are  logical  extensions  to  consider. 

The  ninth  search  indicator  is  the  value  of  the  L2-norm 

of  the  vector  b^(i/i-1)  given  by  Eq.  {6.4},  that  is; 

r         T         ll/2 
I(j,9)  =  norm  b^Ci/i-l)  =   Ib^Ci/i-l)  b^.(i/i-1)J       {6.18} 

Since  b.(i/i-1)  is  a  P  x  1  vector  obtained  at  the  cost 

0(PN+1),  the  calculation  of  I(j,9)  requires  0(PN+P+2) 

operations  for  each  candidate  model  term.   This  is  the  L2- 

norm  of  a  vector  composed  of  time  averages  between  the 

signals  specified  by  each  of  the  existing  model  terms  and 

the  signal  specified  by  the  new  candidate  model  term.   Since 

this  vector  corresponds  to  the  reduced  version  of  matrix 

B(i/i-1)  appearing  in  Eq  .  {4.30}  through  {4.32}  and  Eq . 

{6.4},  it  was  conjectured  that  its  length  might  have  some 

significance.   Unfortunately,  it  also  has  a  high  cost  and 

therefore  offers  no  advantages. 

The  tenth  search  indicator  is  the  value  of  the  L2-norm 

of  the  vector  £.(i/i-1)  given  by  Eq .  {6.5}. 
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r  T  1^/2 

I(j,10)   =   norm  f j(i/i-1  )   =   [f ^ ( i/i-1  )  f ^ ( i/ i-1  )J    |6.19! 
Since  ^j(i/i-l)  is  a  P  x  1  vector  obtained  at  the  cost 
0(P  +PN+1),  the  calculation  of  l(j,10)  requires  0(P^+PN+P+2) 
operations  for  each  candidate  model  term.   This  indicator  is 
the  L2-norm  of  the  matrix  product  of  the  preceding  vector  of 
time  averages  in  _b^(i/i-l)  and  the  inverse  of  the  previous 
model  least  squares  matrix  A(i-1).   This  resulting  vector 
corresponds  to  the  reduced  version  of  matrix  F(i),  appearing 
in  Eq.  (4.30],  Eq .  {4.31},  and  Eq .  {4.37}.   It  was 
conjectured  that  the  length  of  this  vector  might  have  some 
significance  to  the  growth  problem.   Tables  5  and  6  show 
that  is  suffers  from  a  similar  high  computational  cost. 
The  eleventh  search  indicator  is  the  inner  product 


of  the 


vectors  _bj(i/i-l)  and  ^^(i/i-l) 


I(J,11)  '  bj(i/i-l)   fj(i/i-l)  {6.20| 

This  value  appears  in  the  calculation  of  matrix  G(i)  in  Eq . 
{4.31}  and  also  in  l(j,5).   Since  f_Ai/±-^)    is  a  P  x  1 
vector  obtained  at  the  cost  0(P  +PN+1)  which  includes  the 

cost  of  computing  the  P  x  1  vector  _b::(i/i-l),  the 

2 

calculation  of  l(j,1l)  requires  0(P  +PN+P+1)  operations  for 

each  candidate  model  term.   This  second  group  of  search 
indicators  l(j,9)  through  l(j,1l)  do  not  appear  to  offer  any 
advantages  over  the  first  group  of  indicators. 

At  this  point  we  will  leave  the  domain  of  proven  results 
and  use  experimental  analysis  to  develop  other  search 
indicators  for  the  model  growth  problem.   We  provide 
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mathematical  justification  wherever  possible,  but  these  are 
the  results  of  mathematical  rationalization  based  on 
experimental  findings.   The  following  factor  is  the  main 
result  obtained  after  many  detailed  experiments  using 
computer  simulated  systems  and  a  controlled  input  sequence. 
The  twelfth  search  indicator  is  defined  as  the  ratio  of 
the  square  of  the  values  of  the  third  search  indicator, 

divided  by  the  fourth  search  indicator. 

2 

I(j,12)   »   K j,3)  /!( j,4)  [6.21  I 

This  twelfth  indicator  was  experimentally  developed  as  a 
heuristic  compromise  to  the  computational  and  performance 
limitations  of  some  of  the  preceding  indicators.   One 
explanation  of  the  meaning  for  this  search  indicator  is 
described  below. 

The  improvement  in  the  fitting  error  resulting  from  the 

th 
involvement  of  just  the  (j;    candidate  model  term  is 

defined  as  J^(i/i-l),  and  can  be  obtained  from  Eq .  |4.36|  by 

reducing  this  general  vector  equation  to  its  simpler  one - 

term  model  form.   Since  _g(i)  and  jc(i)  become  g^(i)  and 

kj(i),  respectively,  we  obtain; 

J^i/i-1)  =  J^(i)  -  J^(i-I)  =[g(i)^Jc(i)].=  g.(i)  kj(i) 

Substituting  Eq .  |4.33|  into  |6.22|  yields; 

J^^i/i-1)   =   g.(i)G.(i)"^g.(i)   =   [g.(i)]^/Gj(i) 

Substituting  Eq .  {6.4!  and  |6.5l  into  !6.14|  produces 

another  expression  for  the  reduced  version  of  the 

matrix  j(i)  to  the  scalar  G.(i). 


6.  22I 


6.23} 
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2j(i)  =   1  Wj(i/i-1  )  Wj(i/i-1 ) 


N 

-  1  :iiCi/i-1  )^W(i-1  )A(i-1  r-^  W(i-1  )^w.(i/i-1  ) 
N  ^  "-• 

T  -It 

■■       1  w  (i/i-1)   [  I  -  W(i-l)A(i-l)   W(i-1)   ]  w  (i/i-1) 

N 


1  Wi(i/i-1  )  H(i-1  )  w. (i/i-1  ) 


-1 


16.24] 


where  H(i-l)  -  [  I  -  W(i-l)A(i-l)   W(i-1)   ]  {6.25} 

The  matrix  H(i-1)  is  a  function  of  the  preceding  model, 
and  not  a  function  of  the  candidate  model  term.   Therefore 
it  can  be  considered  as  a  constant  scaling  factor  for  each 
candidate  term  evaluation  at  any  model  iteration  step. 
Matrix  H(i-1)  is  positive  semi-definite  since  the  scalar 
G^(i)  cannot  be  negative,  and  H(i-l)  is  also  idempotent. 
Since  G-(i)  is  a  quadratic  form,  we  can  use  a  quadratic 

identity  [Ref.  18,  pp.  254],  and  write  it  as: 

T' 
G.(i)  -   trace  [  J_  w  (i/i-l)  w.(i/i-l)   H(i-l)  ]      [6.261 

N 
After  many  attempts,  we  are  still  unable  to  reduce  Eq . 
{6.26}  to  a  form  that  can  be  more  efficiently  computed. 
Based  on  the  properties  of  matrix  H(i-l),  and  the  heuristic 
belief  that  the  trace  of  the  matrix  in  the  square  brackets 
of  Eq .  {6.26}  is  an  important  factor  to  consider,  we  make 
the  following  approximation.   Justification  for  this 
approximation  will  be  given  in  a  subsequent  theorem.   Using 
Eq .  [6.24],  {6.25^,  and  {6.26},  approximate  G.(i)  by  its 
maximum  value,  G.(i). 
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I 


+  T 

G:i(i)  =  max  G.(i)  =  trace  [  1  w.(i/i-1)  w.(i/i-1)  ]     {6.27} 


J 


Analysis  of  Eq  .  {6.27}  and  Eq .  {6.13}  lead  to  the 
recognition  that  the  trace  of  the  matrix  in  square  brackets 
of  Eq.  {6.27}  equals  search  indicator  I(j,4). 

{6.28} 
Substituting  Eq.  {6.28}  into  Eq  .  {6.23},  and  using 


Gj(i)   =  I(j.4) 


I(j,3)  for  g.(i),  results  in  the  new  search  indicator; 


I( j.12)  =     K j.3) 
I(j,4) 


n   Wj(i/i-1)^e_(i-1) 


1   w  .(i/i-1  )  w.(i/i-1  ) 


{6.29} 


Since  _w.(i/i-1)  is  a  N  x  1  vector,  and  e(i-1)  is  a  N  x  1 
vector  obtained  at  the  cost  0(PN),  the  calculation  of 
I(j,12)  requires  0(PN+2N+4)  operations  for  the  first 
candidate  model  term.   For  the  second  and  subsequent 
candidate  model  terms  the  cost  is  reduced  to  0(2N+4)  since 
e(i-1)  has  already  been  calculated.   Note  that  I(j,12)  is  a 
normalized  version  of  the  square  of  g-(i),  and  therefore 
should  be  a  better  indicator  than  I(j,3).   It  is  also 
cheaper  to  calculate  than  I(j,8).   These  preceding  order  of 
complexity  equations  appear  in  Table  5  and  Table  6  along 
with  some  numerical  examples. 


135 


SEARCH 
INDICATOR 


COMPLEXITY 


N  =  50 
P  =  5 


N  =  50 
P  =  1  0 


N  =  1  00 
P=5 


N  =  1  00 
P  =  50 


N  =  500 
P  =  5 


N=500 
P=50 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1  1 

12 


0(N  +  1  ) 

0(P^+NP+N+P+2) 

0(NP  +  N  +  1  ) 

0(N  +  1  ) 

0(P^+NP+N+P+2) 

0(P^+2NP+2N+P+4) 

0(P^+NP+2N+2P+4) 

0(P^+2NP+2N+P+5) 

0(NP+P+2) 

0(p2*NP+P+2) 

0(P^+NP+P  +  1  ) 

0(NP+2N+4) 


51 
332 
301 

51 
332 
634 
390 
635 
257 
282 
231 
354 


51 

662 

551 

51 

662 

1  214 
725 

1  215 
512 
612 
61  1 
604 


1  01 
632 
601 
101 
632 

1234 
740 

1235 
507 
532 
531 
704 


101 

7652 

5101 

101 

7652 

12754 
7805 

12755 
5052 
7552 
7551 
5204 


501 
3032 
3001 

501 
3032 
6034 
3540 
6035 
2507 
2532 
2531 
3504 


501 
28052 
25501 

501 
28052 
53554 
28605 
53555 
25052 
27552 
27551 
26004 


TABLE  5 :   Order  of  complexity  (Number  of  multiplications  or 
divisions)  required  to  compute  each  search  indicator  value 
for  a  single  candidate  model  term.   Various  examples  of 
model  size  ?  and  measurement  sequence  length  N  are  included. 


Some  of  the  factors  required  in  the  calculation  of  the 
search  indicator  values  of  the  first  candidate  model  term  at 
each  growth  iteration,  can  be  used  in  the  calculation  of 
other  search  indicator  values  for  this  term  and  subsequent 
model  terms.   This  can  be  exploited  to  produce  a  lower 
computational  complexity  for  each  candidate  model  term 
beyond  the  first  as  shown  in  Table  6. 


136 


i 


I 
I 


( 


SEARCH 

N  =  50 

N  =  50 

N  =  1  00 

N  =  1  00 

N  =  500 

N  =  500 

INDICATOR 

COMPLEXITY 

P  =  5 

P  =  1  0 

P  =  5 

P=50 

P  =  5 

P  =  50 

1 

0(N+1 ) 

51 

51 

1  01 

101 

501 

501 

2 

0(P^+NP+N+P+2) 

332 

662 

632 

7652 

3032 

28052 

3 

0  (  N  +  1  ) 

51 

51 

101 

101 

501 

501 

4 

0(N+1 ) 

51 

51 

101 

101 

501 

501 

5 

0(P^+NP+N+P+2) 

332 

662 

632 

7652 

3032 

28052 

6 

0(P^t-NP  +  2N  +  P  +  4) 

384 

714 

734 

7754 

3534 

28554 

7 

0(P^+NP+2N+2P+4) 

390 

725 

740 

7805 

3540 

28605 

8 

0(P^+NP  +  2N+P-f5) 

385 

71  5 

735 

7755 

3535 

28555 

9 

0(NP+P+2) 

257 

512 

507 

5052 

2507 

25052 

10 

0(P^-»-NP  +  P  +  2) 

282 

612 

532 

7552 

2532 

27552 

1  1 

0(P^  +  NP  +  P  +  1  ) 

281 

61  1 

531 

7551 

2531 

27551 

1  2 

0(2N+4) 

104 

104 

204 

204 

1004 

1004 

TABLE  6 ;   Order  of  complexity  (Number  of  multiplications  or 
divisions)  required  to  compute  each  search  indicator  value 
for  subsequent  model  terms  beyond  the  first.   Various  examples 
of  model  size  P  and  measurement  sequence  length  N  are  included 


Based  on  the  preceding  development  of  l(j,12),  we  state 
and  prove  the  following  theorem. 


THEOREM  2;   LOWER  BOUND  ON  REDUCTION  IN  FITTING  ERROR 

l(j,12)  is  a  lower  bound  on  the  improvement  in  the 
fitting  error  resulting  from  including  the  single  model  term 
w-(n,i/i-l)  at  the  (i)    growth  iteration. 
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PROOF: 


From  Eq.  [6.121,  l(j,3)  =  S A i) -       Substituting  this  into 


Eq.  {6.23}  yields; 


Jj(i/i-l) 


1(3,3)       I    Gj(i) 


I6.30I 


From  the  development  of  Eq .  {6.24}  and  Eq .  [6.27],  we  se( 


that  Gj(i)  can  be  written  as; 


a.(i) 


Gj(i)  -  Pj   =   I(j,4)   -   Pj  {6.31 1 


where  P.  is  nonnegative  and  Pj  <  l(j,4).   Therefore  l(j,4) 
is  an  upper  bound  on  Gj(i).   Applying  this  last  result  to 

Eq.  16.30J  and  Eq  .  {6.29}  yields  the  result  that  l(j,12)  i; 

2 

a  lower  bound  on  J-(i/i-l). 


We  have  shown  how  the  value  of  l(j,12)  is  related  to  the 
improvement  in  the  error  fitting  criterion.   Tables  5  and  6 
show  that  this  search  indicator  can  be  computed  with  a  very 
low  computational  cost.   In  fact,  the  cost  in  Table  6  is  not 
a  function  of  the  size  P  of  the  existing  model,  only  of  the 
number  N  of  data  measurements. 

The  power  of  this  new  search  indicator  is  significant. 
The  computer  simulated  and  real  world  experiments  we  have 
performed  indicate  chat  it  is  an  excellent  indicator  of  the 
fitting  error  improvement  that  results  from  including  the 
candidate  model  term.   Because  the  value  of  l(j,12)  is 
proportional  to  the  square  of  l(j,3),  it  rarely  happens  that 
a  term  with  low  l(j,12)  will  have  a  significantly  large 
value  of  l(j,8),  the  actual  fitting  error  improvement.   The 


1  38 
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fact  that  l(j,12)  is  easily  computed  adds  to  its 
significance . 

For  starting  the  model  growth,  we  can  select  the  subset 
of  terms  in  w(n,l)  with  the  one  or  two  largest  values  of 
l(j,12).   At  this  first  iteration  there  is  no  error  residual 
signal  since  there  is  no  existing  model,  so  we  use  the  total 
model  output  sequence  jyCn)]  in  place  of  le(n,0)|.   While  we 
have  not  been  able  to  prove  that  this  manner  of  specifying 
this  subset  of  _w(n,l)  prevents  inclusion  of  unneeded  terms, 
results  of  many  experiments  show  this  method  provides  a  good 
set  of  starting  terms  and  generally  yields  more  compact 
models . 

Ve  have  examined  the  characteristics  of  search 
indicators  l(j,l)  through  l(j,12)  under  experimental 
conditions.   This  involved  numerous  experiments  with 
synthesized  systems,  a  controlled  input  probe  sequence,  and 
the  assumption  of  no  additive  output  noise.   A  subsequent 
section  examines  the  robustness  of  model  growth  in  cases 
where  preceding  assumptions  are  relaxed. 

A  thirteenth  search  indicator  is  designed  for  a  special 
purpose.   We  previously  discussed  the  potential  problem  of 
nearly  equivalent  performance  from  different  model  terms. 
This  leads  to  il 1- c ond i tioning  of  the  least  squares  matrix 
and  the  possibility  of  multiple  solutions. 

The  thirteenth  search  indicator  is  the  maximum  result 
chosen  from  the  set  of  squared  and  normalized  time  averages 
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obtained  from  the  product  of  the  signal  specified  by  the 
candidate  model  term,  and  the  signals  specified  from  each  of 
the  other  q(i)  candidate  model  terms  at  this  iteration. 


f    r- 


I  (  J  I  1 3  )  =   maximum 
1 <=k<=q(i) 


1  J]   Wj(n,i/i-l)  Wj^(n,i/i-l) 
N  n  =  n2 


n^  2     ^3  2  J 

1   Z  Wj(n,i/i-1}  1   5^  Wj^(n,i/i-1  )  ; 


> 


N  n=n. 


N  n=n. 


|6.32| 


Examination  of  Eq .  {4.27}  and  Eq .  {6.32r  shows  that  the 
value  of  l(j,13)  equals  the  maximum  ratio  of  the  square  of 
each  off-diagonal  element  of  the  (j)    row  of  A(i/i-l),  and 
the  product  of  the  diagonal  elements  of  the  corresponding 
column  and  the  (,  j }    row.   In  physical  terras,  large  I(j,13) 
means  there  is  the  possibility  of  significant  correlation 
between  the  signal  specified  by  the  ( j)    candidate  model 
term  and  the  signal  specified  by  another  candidate  model 
term.   This  is  related  to  the  multiple  correlation 
coefficient  in  regression  analysis.   The  following  set  of 
theorems  show  that  a  necessary  condition  for  the  least 
squares  solution  to  represent  a  unique  minimum  is  that 
A(i/i-l)  be  positive  definite  at  each  growth  iteration. 
They  also  shown  how  this  condition  is  related  to  the  values 
of  l(j,13).   Theorem  3  is  well  known  in  the  linear  algebra 
and  matrix  literature,  and  is  included  for  completeness. 
THEOREM  3; 

The  elements  of  a  positive  definite  matrix  D  =  [ d j  J 
must  satisfy  the  inequality; 
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1 


I 


i 


Jk.      J  J   kk. 


for  all  j^k 


|6.33l 


PROOF 


Assume  that  the  converse  of  |6.33|  existed  for  some  d-,  . 

J  iC 

,  . St  ,  . th 

We  could  interchange  the  (1)    column  of  D  with  the  (j) 

^/.<.nd  ,.th  ,.st 

coxumn,  the  {2)  coxumn  with  the  ( k }    column,  the  (1 ) 

/vCh  ,.nd  ,vth 

row  with  the  (,  j ;    row,  and  the  [2)         row  with  the  (k;    row 

without  affecting  the  definiteness  of  D.   From  the  converse 

of  |6.33l,  the  determinant  of  this  2x2  principle  submatrix 

of  D  would  now  be  less  than  or  equal  to  zero,  and  D    could 

not  be  positive  definite.   Therefore  {6.33}  is  a  necessary 

condition. 

THEOREM  4: 

A  necessary  condition  for  the  matrix  A(i/i-l)  given  by 
Eq .  [4. 27}  to  be  positive  definite,  is  that  the  value  of 
^(j»13)  for  each  of  the  q(i)  terms  in  the  (i)    iteration 
must  satisfy  the  following  inequality. 

i(j,13)  <  1     for  all  j,  j=1 , 2 , . . . , q( i)      |6.34| 
PROOF: 

From  Eq  .  [4.271,  all  diagonal  elements  of  A(i/i-l)  are 
nonnegative.   From  Eq .  |6.32|  and  Theorem  3  we  see  that 
{6.34}  is  equivalent  to  |6.33l,  and  therefore  {6. 34;  is  a 
necessary  condition  for  A(i/i-l)  to  be  positive  definite. 


THEOREM  5: 

A  necessary  condition  for  the  uniqueness  of  the  solution 
of  the  normal  equations  [4.241,  and  repeated  below; 
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4 


4 
1 


I 


4 
4 


A(i)0(i)  = 


A(i-1  )     I  B(i/i-l) 
B(i/i-1)^  I  A(i/i-1) 


0(i) 


h(i) 


{6.35} 


is  that  the  value  of  I(j,13)  for  each  of  the  q(i)  terms  in 
the  (i)    iteration  must  satisfy  the  following  inequality. 

I(j.13)  <  1     for  all  j.  j=l,2 q(i)        {6.36} 

PROOF 

From  Chapter  III,  Eq .  {3-18}  describes  the  condition 

that  the  least  squares  matrix  A(i)  must  be  positive  definite 

2 
for  J  (i)  to  represent  a  unique  minimum.   If  A(i)  is  not 

positive  definite,  the  system  of  equations  given  by  Eq , 

{6.35}  contains  more  than  one  set  of  solutions  that 

equivalently  minimize  the  fitting  error  criterion.   The 

least  squares  error  minimization  may  become  extremely 

unstable  since  the  minimum  will  tend  to  lie  on  a  line  or 

surface  in  parameter  space,  rather  than  at  a  point.   From 

Theorem  U,  Eq  ,  {6.3^1  is  a  necessary  condition  that  A(i/i-1) 

is  positive  definite.   It  follows  directly  from  the  proof  of 

Theorem  3.  that  a  necessary  condition  for  A(i)  to  be 

positive  definite  is  that  A(i/i-1)  is  also  positive 

definite.   Therefore  we  see  that  {6.3^},  or  equivalently 

{6.36}  is  a  necessary  condition. 


Other  possible  search  indicators  are  contained  in  the 
area  of  the  patterns  in  the  residual  sequence  of  the  (i-1) 
model.   We  know  that  when  we  have  completely  modeled  a 
system  using  noise  free  measurements,  the  error  residual 
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should  be  a  random  sequence  without  any  trends  or  patterns. 
In  fact,  we  expect  that  this  case  will  produce  an  error 
residual  sequence  composed  of  a  series  of  very  short 
segments  of  alternating  sign.   It  is  reasonable  to  expect 
that  the  pattern  of  segments  we  see  in  the  residual  when  we 
have  underraodeled  the  system,  is  representative  of  the 

missing  terra(s)  in  the  model.   The  problem  is  to  learn  how 

s  t 
to  decode  this  information  from  the  patterns  in  the  (i-1) 

model,  to  aid  us  in  selecting  the  missing  term  or  terras. 

C.   SEARCH  INDICATOR  GROWTH  ALGORITHM 

Our  proposed  Search  Indicator  Growth  Algorithm  is 
represented  in  Figure  23-   We  start  by  specifying  a  very 
large  set  of  candidate  model  terms.   Our  algorithm  picks  the 
subset  of  candidate  model  terms  whose  I(j,12)  values  are 
greater  than  some  specified  value  o^  the  variable  h,  (e.g. 
70*  of  the  maximum  value  of  I(j,12)  for  any  candidate  term). 
Before  adding  the  selected  term(3)  to  the  model  for 
subsequent  evaluation  of  the  fitting  error,  we  calculate  the 
value  of  I(j,13)  for  each  selected  term  using  Eq  .  {6.32}.   A 
second  heuristic  variable  h.  is  used  to  indicate  when 
significant  colinearity  is  present.   Values  of  I(j,13)  close 
to  1  indicate  that  the  (j)    candidate  model  term  (out  of 
the  selected  set)  is  nearly  linearly  dependent  on  another 
candidate  model  term.   This  other  term  is  used  in  the 
calculation  for  the  ( k )    row  of  A(i/i-1),  and  contributes 
to  the  large  I(j,13).   When  I(j,13)  is  greater  than  h  ,  we 


1^3 


discard  the  candidate  model  terra  of  this  pair  that  has  the 
lower  value  of  I(j,12),  re-estimate  I(j,13)  for  the 
remaining  term,  and  continue  until  all  values  of  I(j,13)  are 
sufficiently  small  (e.g.  less  than  0.85). 

This  iterative  two-phase  growth  technique  is  based  on 
the  terras  selected  by  the  search  indicators,  and  has  a  rauch 

lower  computational  cost  than  the  complete  evaluation  of 

2 
J  (i)  for  each  combination  of  possible  new  terras. 
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STEP  1 


STEP  2 


STEP  3 


STEP  4 


STEP  5 


STEP  6 


STEP  7 


STEP  8 


STEP  10 


ACCEPT  SYSTEM  INPUT  AND 
OUTPUT  MEASUREMENT  SEQUENCES. 
SET  MODEL  ITERATION  INDEX  i=1 . 


CHOOSE  SET  OF  CANDIDATE 
MODEL  TERMS  FOR  ITERATION  1. 


CALCULATE  l(j,12)  FOR  EACH 
CANDIDATE  MODEL  TERM. 


SELECT 
TERMS  W 
SOME  SE 


SUBSET 
ITH  I( 
LECTED 


OF  CANDIDATE  MODEL 
j,  12)  GREATER  THAN 


LIMIT  h 


1- 


CALCULATE  l(j,13)  FOR  EACH 
CANDIDATE  MODEL  TERM  IN  THE 
ABOVE  SUBSET.   DETERMINE' 
THE  RELAT3D  PAIRS  OF  TERMS. 


FOR  EACH  PAIR  OF  TERMS  WHOSE 
I(j,13)  VALUE  IS  GREATER  THAN  SOME 
SELECTED  LIMIT  h2 ,  DISCARD  THE 
MODEL  TERM  VITH  THE  LOWER  l(j,12) 
VALUE.   CONTINUE  UNTIL  ALL  l(j,13) 
VALUES  ARE  LESS  THAN  h2 • 


+ 

SVA 
USI 

AND 

LUATE  THE  RESULTING 
NG  THE  RECURSIVE  EVA 
SOLUTION  EQUATIONS. 

MODEL 
LUATION 

STEP  9 


REALIZE  MODEL  i  AND 
PRODUCE  THE  ERROR 
RESIDUAL  SEQUENCE 
I  e(n,i)  1  .   SET  ^- ' 


i=i+1 


CALCULATE 

THE  VALUES  OF  ALL 

MODEL 

COEFFICIENTS. 

VERIFY  THE 

PREDICTION 

PERFORMANCE 

OF  THE  MODEL 

WITH  NEW  3 

YSTEM  DATA. 

STOP 

IF  PERFORMANCE  IS 

ACCEPTABLE 

,  ELSE  GO  TO 

STEP 

9. 

FIGURE  2  3 :   Flow  Diagram  of  the  Search  Indicator  Growth  Algorithm 
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The  heuristic  variable  h   determines  the  number  of  terms 
selected  for  inclusion  in  the  model,  and  can  be  set  based  on 
the  distribution  of  the  values  of  the  search  indicator 
I(j,12).   If  there  is  a  grouping  of  terms  with  high  values 
for  I(j,12),  they  should  probably  all  be  accepted  into  the 
model.   If  there  are  only  a  few  terras  with  high  values  of 
I(j,12),  we  should  select  them  all,  plus  possibly  a  few  more 
with  slightly  lower  values  of  I.(j,12).   There  is  a 
disadvantage  of  selecting  h-  too  small,  since  this  can 
result  in  the  requirement  for  extra  iterations  in  order  to 
obtain  all  of  the  needed  terms  in  the  final  model. 

The  heuristic  variable  h   determines  the  amount  of 
colinearity  allowed  between  model  terms.   If  chosen  too  low, 
it  will  delay  or  prevent  acceptance  of  actually  needed  model 
terms  that  happen  to  be  somewhat  correlated  with  existing 
model  terms.   If  chosen  too  high,  it  allows  extra  terras  into 
the  model  and  thereby  increase  the  ill-conditioning  of  the 
least  squares  raatrix.   We  have  experimentally  found  the 
range  0.7  <=  h   <=  0.85  to  be  most  effective. 

The  next  section  examines  the  coraputat ional  cost  of 
model  growth  using  the  techniques  discussed  to  this  point. 
The  result  is  that  model  growth  using  the  search  indicator 
techniques  developed  in  this  chapter  offers  a  new  and 
efficient  means  of  obtaining  models. 
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D.   COMPUTATIONAL  COMPARISON  OF  GROWTH  TECHNIQUES 

This  section  examines  the  algorithms  and  computational 
cost  associated  with  the  model  growth  techniques  presented 
in  this  thesis.   We  use  N  to  denote  the  number  of  data 
points,  c(i)  for  the  number  of  model  terms  in  iteration  i, 
and  q(i)  for  the  number  of  candidate  new  terms  at  this 
iteration.   Therefore  q(i)  =  c(i)-c(i-1).   Each  growth 
technique  is  presented  in  algorithmic  form  as  a  series  of 
steps,  and  the  number  of  multiplicative  or  division 
computations  required  at  each  step  is  indicated.   The 
details  of  these  order  of  complexity  calculations  are  based 
on  the  size  of  the  various  matrices,  vectors,  and  sequences 
used  in  the  model  growth,  and  are  included  in  Appendix  C  for 
the  interested  reader.   The  computational  cost  equation  is 
formed  for  a  full  iteration  at  the  end  of  each  technique, 
and  an  example  is  used  to  emphasize  the  differences  in  the 
growth  techniques. 
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Technique  .1  ;   Direct  Least  Squares 


Step  1 

Step  2 

Step  3 

Step  4 

Step  5 

Step  6 


Set  i  =  1,  form  term  vector  _x(n,i) 
Form  R(i)  using  Eq .  {4.5} 
Form  v_(  i)     using  Eq .  |4.6} 

Invert  H(i) 

2 

Solve  for  J  (i)  using  Eq .  {4.31 

2, 


Computational  Cost 

[n  +  1  ]c(i)[c(i)  +  1  ]/2 
c(i)[N+1 ] 
[c(i)**3]/6 
[c(i)**2]+c(i)+N+1 


If  J  (i)  <  acceptable  level,  stop. 
Else; 
Step  7:   Set  i  =  i+1 ,  form  a  new  term 
vector  w(n,i).   Go  to  Step  2 


Total  cost  for  Steps  1  through  7  is; 


0(n)   =   [c(i)»»3]/6  +  r[N+3]/2][c(i)**2]  +  [ [3N  +  5  ] /2  ]  c( i)  +  N  +  1 


ijxa 


mple;  N  =  500,  c(l)  =  10,  c(2)  =  20,  c(3)  =  30 
Iteration  Number  of  multiplicative  operations 

1  33343 

2  117485 

3  253926 


TOTAL  =  404754 
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Technique  2 ;   Block  Form  Recursive  Growth 


Computational  Cost 


Step  1 

Step  2 

Step  3 

Step  4 

Step  5 

Step  6 

Step  7: 


Step  8: 
Step  9: 
Step  1 0: 
Step  1 1 : 
Step  1 2; 
Step  1 3; 
Step  1 4; 
Step  1 5 : 
Step  1 6 ; 
Step  1 7; 

Step  18; 

Step  1  9 


Set  i  =  1,  form  term  vector  x^(n,i) 
Form  R(i)  using  Eq .  {4.5l 
Form  _r(i)  using  Eq .  {4.6} 
Invert  R(i) 

Solve  for  J  (i)  using  Eq .  {4.3} 

2 

If  J  (i)  <  acceptable  level,  stop. 

Else  ; 

Set  i  =  i+1 ,  form  a  new  term 

vector  jir(n,i/i-l) 

Form  A(i/i-l)  using  Eq .  [4.27} 

Form  B(i/i-l)  using  Eq ,  U-26| 

Form  h(i/i-l)  using  Eq .  |4.29l 

Form  F(i)  using  Eq ,  (4.30] 

Form  G(i)  using  Eq .  {4.31I 

Form  _£(i)  using  Eq  .  14.32) 

Invert  G(i) 

Form  _k(i)  using  Eq .  |4.33| 

Solve  for  J  (i)  using  Eq .  |4.36| 

2 

If  J  (i)  <  acceptable  level,  stop. 

Else  ; 

Form  inverse  of  A(i)  using 

Eq.  {4.37} 

Go  to  Step  7 


[n  +  1  ]c(i)[c(i)  +  1  ]/2 
c(i)[N+1 ] 
[c(i)**3]/6 
[c(i)**2]+c(i)+N+1 


[N  +  1  ]q(i)[q(i)  +  1  ]/2 
q(i)p[N+1 ] 
q(i)[N+1 ] 
q(i)[P**2] 
P[q(i)**2] 

Pq(i) 

[q(i)»*3]/6 
q(i)**2 
q(i) 


q(i)[P**2] 

+  P[q(i)**2] 
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Cost  for  Steps  1  through  7  is  the  same  as  Technique  1. 

Total  cost  for  Steps  8  through  17  are; 

0(n)  =  [q(i)»»3]/6  +  [ P+N+3 ] [ q ( i ) **2 ] /2  +  q ( i ) [ NP+ [ P**2 ] +2P+ [ 3N+5/2 ] 

Cost  for  Step  18  is;  0(n)  =  q(i)[P*»2]  +  P[q(i)**2] 


Example ;  N 

Iteration 

1 

2 

3 


500.  c(1)  =  10.  c(2)  =  20.  c(3)  =  30 

Steps  Number  of  multiplicative  operations 

1  -  7  33343 

8-18  84542+2000(for  step  18)  =   86542 
8-17  138240 


TOTAL  =    258127 
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Technique  3 '    Search  Indicator  Growth  Algorithm   Computational  Cost 
Set  i  =  1,  form  term  vector  x(n,i) 
Form  R(i)  using  Eq .  {4.5} 
Form  £(i)  using  Eq .  {4.6} 
Invert  R( i ) 


Step  1: 
Step  2: 
Step  3: 
Step  4: 
Step  5: 
Step  6: 


2 

Solve  for  J  (i)  using  Eq.  {4.3} 

2,.. 


If  J  (i)  <  acceptable  level,  stop. 

Else  ; 
Step  7:   Set  i  =  i+1,  form  a  new  term 

vector  w( n , i/i- 1 ) 
Step  8:   Form  I(j,12)  for  each  term  in 

w(n,i/i-1)  using  Eq .  {6.29} 
Step  9:   Select  the  subset  of  k  terras  with 

values  of  I(j,12)  greater  than  a 

specified  level  h,.   Reduce  the 

vector  w(n,i/i-1)  to  only  contain 

this  subset  of  k  terms 
Step  10:  Form  A(i/i-1)  using  the  reduced 

vector  w(n,i/i-l)  in  Eq,  {4.27} 
Step  11:  Form  3(i/i-1)  using  the  reduced 

vector  w(n,i/i-1)  in  Eq  .  {4.26} 
Step  12:  Form  Ji(i/i-1)  using  the  reduced 

vector  w(n,i/i-1)  in  Eq  .  {4.29} 
Step  13:  Form  F(i)  using  the  reduced 

vector  w_(  n  ,  i  /  i  -  1  )  in  Eq  .  {4.30} 
Step  14:  Form  G(i)  using  the  reduced 

vector  w(n.i/i-1)  in  Eq.  {4.31} 


[N+1  ]c( i) [c(i)  +  1 ]/2 
c(i) [N+1 ] 
[c(i)»*3]/6 
[c(i)*»2]+c(i)+N+1 


NP  +  [2N+4]q( i ) 


No  cost 


k[k+1  ] [N+1  ]/2 


kP[N+1  ] 


k[N+1  ] 


k[P*»2] 


P[k»*2] 
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step  15:  Form  _g  ( i )  using  the  reduced 

vector  w(n,i/i-1)  in  Eq .  {4.32} 
Step  16:  Invert  G(i) 

Step  17:  Form  k^(  i )  using  Eq .  {4.33} 

2 

Step  18:  Solve  for  J  (i)  using  Eq.  {4.36} 

2 
Step  19:  If  J  (i)  <  acceptable  level,  stop 

Else  , 
Step  20:  Form  inverse  of  A(i)  using 

Eq.  {4.37} 
Step  2  1:  Go  to  Step  7 


Pk 

[k*»3]/6 

[k**2] 

k 


[P*»2]k  +  P[k**2] 


Cost  for  Steps  1  through  7  is  the  same  as  Technique  1  and 

Technique  2.   Total  cost  for  Steps  8  through  19  is; 

0(n)  =  [k»»3]/6  +  [k»»2]  [P+[N  +  3]/2]  +  k  [  2P  +  PN-^  [  P*»2  ]  +  [  3N  +  5  ] /2  ] 

+  NP  ♦[2N+4]q(i) 
Cost  for  Steps  20  through  21  is; 
0(n)  =  rp»»2]k  +  P[k»»2] 

Example:  H    =  500.  c(1)  =  10,  c(2)  =  20,  c(3)  =  30,  Let  k  =  3 
Iteration    Steps      Number  of  multiplicative  operations 

1  1-7  33343 

2  8-21     35017+390(for  step  20)      35407 

3  8-19  49305 


TOTAL  =    1  18055 

Note:   Additional  savings  can  be  realized  when  I(j,13)  is 
used  to  eliminate  highly  colinear  terms  in  Step  9- 
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The  preceding  example  shows  that  the  Search  Indicator 
Growth  Algorithm  can  require  substantially  lower 
computational  cost  than  the  other  two  techniques.   Table  7 
summarized  the  results  of  the  example.   Because  of  this 
lower  cost,  we  can  consider  a  greater  number  of  candidate 
terras  during  each  iteration  than  would  be  possible  with  the 
direct  or  block-form  techniques.   This  increases  the 
probability  that  we  will  consider  the  terms  actually  needed 
in  the  model.   The  performance  of  this  algorithm  will  be 
demonstrated  in  the  experiments  of  Chapter  VII. 


Technique 


j_   Direct  Least  Squares 

2  Block  form  Recursive 

3  Search  Indicator  Growth 


Cost  of     Cost  of     Cost  of    Total 
Iteration  .Iteration   Iteration   Cost 
1  2  3 


33343  117U85  253926  404754 
33343  86542  138240  258127 
33343       35407       49305    118055 


TABLE  7:   Computation  Cost  (Number  of  Multiplications  or  Divisions) 
Required  in  example  of  Section  D,  Chapter  VI. 


E.   FACTORS  AFFECTING  MODEL  EVALUATION  AND  GROWTH 

Chapters  I  and  II  mentioned  that  there  were  two  main 
factors  that  can  limit  the  ability  to  accurately  model  a 
system  from  input  and  output  measurements.   These  are;  (1) 
the  ability  to  control  the  input  signal  applied  to  the 
system,  and  (2)  the  presence  of  output  measurement  noise. 
The  four  permutations  of  these  two  factors  are  represented 
in  the  following  table. 
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I 
I 


SYSTEM  INPUT 

OUTPUT  MEASUREMENTS 

Case 

Uncontrollab le 

Controllab le 

No  Noise 

Additive  Noise 

1  A 
1  B 
2A 
2B 

X 
X 

X 

X 

X 
X 

X 
X 

TABLE  8:   System  Characterization  Conditions 

Other  factors  include  the  form  of  the  system  and  the 
model  (e.g.  other  than  BVM),  choice  of  error  minimization 
method,  and  selection  of  sampling  interval  (over  or  under 
sampling  is  a  possibility).   It  is  assumed  that  these  last 
two  factors  are  not  a  problem  in  the  examples  we  consider. 

We  have  been  primarily  concerned  with  Case  2A  in  this 
thesis  because  it  allows  us  to  focus  on  just  the  choice  of 
model  terms.   In  the  computer  simulated  experiments,  we 
generate  an  input  probe  using  a  uniformly  distributed 
pseud  0- rand om  number  generator.   The  amplitude  values  of 
this  sequence  are  scaled  to  cover  the  known  (or  assumed) 
operating  range  of  the  nominal  system  input.   We  then  apply 
this  input  sequence  to  the  system,  and  use  the  resulting 
output  sequence  along  with  the  input  probe  sequence  to  grow 
the  model  by  any  of  the  techniques  discussed  in  the 
preceding  chapters. 

A  uniform  distribution  was  chosen  for  the  input  probe 
rather  than  the  gaussian  distribution  typically  mentioned  in 
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the  literature,  based  on  the  following  argument.   Nonlinear 
terms  contribute  to  the  output  sequence  in  a  nonlinear 
amplitude  dependent  manner.   Since  we  don't  know  the  form  of 
the  nonlinear  system  terms,  select  an  input  probe  that  is 
equally  likely  to  take  on  any  value  in  the  allowed  range. 
One  could,  of  course,  postulate  system  examples  where  a 
nonuniform  input  probe  amplitude  distribution  provides  more 
efficient  model  growth  (e.g.  more  significant  differences  in 
the  behavior  of  the  candidate  model  terms). 

Case  2B  has  additive  output  noise  contaminating  the 
system  output  sequence  |y(n)|.   This  is  the  next  step 
towards  the  situation  we  must  face  in  the  real  world.   If  it 
is  reasonable  to  consider  this  additive  noise  to  be  zero 
mean,  stationary,  and  uncorrelated  with  the  system  input, 
then  we  can  perform  some  filtering  to  reduce  the  distortion 
examined  in  Chapter  III.   Since  we  still  control  the  input 
sequence,  we  can  measure  and  record  the  noisy  output 
sequence  for  M  repeated  applications  of  the  identical  input 
sequence.   A  po int- f o r- po in t  ensemble  average  of  the  noisy 
system  output  can  then  be  performed,  which  reduces  the 
variance  of  |y(n)|  by  the  factor  M.   This  filters  the  output 
variation  due  to  the  additive  noise,  and  we  can  grow  a  model 
using  the  input  sequence,  and  the  output  sequence 
corresponding  to  the  average  of  the  noisy  output  sequences. 
This  technique  has  been  tried  experimentally  and  produces 
improved  results.   An  example  is  presented  in  Chapter  VII. 
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Case  IB  has  no  additive  output  noise  but  we  must  work 
with  the  given  input  sequence  (e.g.,  we  cannot  probe  the 
system  ourselves).   If  the  given  input  sequence  is 
sufficiently  wideband  or  "'persistently  exciting"  [Ref.  14, 
pp  42],  then  the  least  squares  matrix  A(i)  will  be  well 
conditioned  at  each  growth  iteration  i,  and  the  growth 
techniques  provide  useful  results.   Each  specific  case  of 
input  signal,  system  output,  and  model  form  must  be  examined 
experimentally  to  determine  if  the  evaluation  equations  are 
well  conditioned.   Examination  of  the  amplitude  distribution 
and  the  empirical  sample  autocorrelation  of  the  particular 
input  sequence  gives  a  qualitative  measure  of  the 
suitability  of  the  available  input  for  systems 
characterization.   Much  work  needs  to  be  done  in  rating  a 
given  input  signal  for  use  in  systems  characterization,  and 
this  is  suggested  as  an  area  for  future  research. 
Ultimately,  it  is  the  value  of  the  obtained  final  model  in 
the  intended  application  that  determines  the  adequacy  of  the 
input  signal  used  in  the  characterization. 

Case  1A  is  the  most  difficult  set  of  conditions  for  any 
model  growth  technique.   Even  if  we  knew  the  exact  form  of 
the  final  model,  and  were  therefore  just  doing  parameter 
estimation,  the  output  noise  would  degrade  the  model  growth 
and  evaluation  error.   We  may  not  obtain  a  useful  system 
characterization  under  these  conditions. 
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Case  1A  is  the  situation  normally  encountered  when  we 
don't  have  control  of  the  experiment  that  obtains  the 
measurement  data.   Two  possible  techniques  to  try  are  as 
follows.   Using  the  given  input  and  output  measurement 
sequences,  we  could  use  the  search  indicator  growth 
algorithm  until  we  reach  a  limiting  number  of  model  terms 
(e.g.  N/10),  or  until  there  was  no  significant  improvement 
in  the  fitting  error.   At  this  point  we  "freeze"  the  current 
model  and  simulate  it  on  the  computer.   By  probing  this 
mathematical  model  with  the  given  input  sequence 
{u(n);  S<rn<=T},  we  can  produce  the  model  output  sequence 
{y(n)}.   Using  a  nonlinear  iterative  algorithm  such  as 
Marquardt  [Ref.  17],  we  could  perform  an  iterative  nonlinear 
analysis  in  an  attempt  to  refine  the  parameter  estimates  and 
reduce  the  magnitude  of  the  output  error  e(n)  =  y(n)  -  y(n). 
Using  this  corrected  model,  and  the  least  squares  matrix  and 
vector  corresponding  to  it,  we  could  then  grow  from  this 
point  using  the  Search  Indicator  Growth  Algorithm.   This 
two-phase  process  could  continue  until  no  significant 
decrease  in  J  is  obtained. 

Another  proposed  technique  would  be  to  grow  a 
nonrecursive  model  like  the  VOL(d,m)  from  the  input  and 
noisy  output  measurements,  using  the  Search  Indicator  Growth 
Algorithm.   The  noisy  output  data  would  not  distort  the 
coefficients  of  a  nonrecursive  model,  and  it  might  be 
possible  to  obtain  a  reasonable  fit.   Since  there  is  noise 
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added  to  the  system  output,  a  stopping  criterion  such  as 
independence  of  the  residual  sequence  {e(n,i)},  as  measured 

by  its  autocorrelation  sequence,  would  make  more  sense  than 

2 
the  magnitude  of  the  fitting  error  J  (i).   When  a 

nonrecursive  model  with  a  limiting  number  of  terms  (e.g. 
N/10)  is  obtained,  or  {e(n)}  is  found  to  be  uncorrelated  , 
then  a  second  phase  would  be  used.   The  previously 
determined  nonrecursive  model  would  be  used  along  with  the 
input  signal  to  produce  the  model  output  {y(n)}.   The  input 
signal  {u(n)}  and  the  nonrecursive  model  output  {y(n)}  would 
then  be  used  to  grow  a  more  general  and  probably  more 
compact  recursive  model  like  the  BVM  and  using  the  Search 
Indicator  Growth  Algorithm.   This  concept  could  be  expected 
to  reduce  the  effect  of  the  additive  output  noise.   We 
denote  this  as  the  "N-R"  technique  because  it  uses  both 
nonrecursive  and  recursive  models. 

This  concept  is  related  to  a  recently  developed  two- 
stage  least  squares  parameter  estimation  algorithm  for 
linear  systems  [Ref.  42].   The  method  presented  here  is  more 
powerful  since  it  is  applicable  to  model  growth  for 
nonlinear  systems  and  uses  the  efficient  search  indicator 
growth  algorithm  developed  in  the  previous  section. 
Experimental  analysis  of  the  method  discussed  in  this 
section  is  provided  in  Chapter  VII. 
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VII.   EXPERIMENTS  IN  SYSTEM  CHARACTERIZATION 

A.   DISCUSSION 

The  preceding  chapter  developed  the  Search  Indicator 
Growth  Algorithm  and  showed  the  computational  advantages 
that  result  from  its  use.   The  next  step  is  the  experimental 
evaluation  of  the  performance  of  this  proposed  algorithm  in 
characterizing  systems.   These  evaluations  include 
comparisons  with  the  performance  of  the  block-form 
techniques  developed  in  Chapter  V. 

This  chapter  contains  several  experiments  designed  to 
demonstrate  the  strengths  and  limitations  of  the  model 
growth  techniques  presented  in  this  thesis.   In  the  first 
six  experiments  we  synthesize  a  given  system  equation  on  a 
computer,  and  generate  a  finite  length  pseudo-random  input 
sequence  {u(n)}  uniformly  distributed  between  chosen 
amplitude  limits.   Each  case  involves  probing  the  system 
equation  with  the  input  sequence  to  create  an  output 
sequence  {y(n)}.   These  input  and  output  sequences  are  then 
used  as  data  points  for  the  model  growth  techniques. 
Various  system  features  and  measurement  noise  conditions  are 
included  for  illustrative  purposes. 

The  advantage  of  using  synthesized  systems  is  that  it 
allows  us  to  examine  the  properties  of  the  model  growth 
techniques  under  conditions  that  do  not  obscure  the  key 
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differences.   We  can  more  clearly  see  the  weaknesses  of  some 
techniques  and  verify  how  other  techniques  can  compensate 
for  related  problems.   The  Covariance  error  minimization 
method  is  used  for  each  growth  technique  because  of  its 
superior  performance  (Chapter  III). 

The  third  section  of  this  chapter  examines  the 
capabilities  of  our  best  growth  techniques  on  a  real  world 
example,  where  we  must  work  with  the  single  set  of  available 
measurement  sequences  (Case  1A  in  Chapter  VI).   Verification 
of  the  modeling  results  is  not  as  direct  in  this  case  since 
the  actual  system  equation  is  unknown.   This  real  example 
verifies  some  of  the  inherent  weaknesses  of  model  growth 
techniques  when  we  are  faced  with  Case  1A  conditions.   The 
final  section  summarizes  the  experimental  findings. 

B.   CONTROLLED  EXPERIMENTS 

The  systems  used  in  these  experiments  were  not  selected 
to  bias  the  findings  in  favor  of  any  technique.   We  have  not 
excluded  any  examples  or  experiments  that  produced  contrary 
results.   The  following  set  of  experiments  are  honestly 
considered  to  fairly  examine  the  basic  properties  of  the 
various  model  growth  techniques.   We  start  these  experiments 
with  Experiment  2,  since  Experiment  1  is  contained  in 
Cha-pter  III. 

Experiment  2 

The  purpose  of  this  experiment  is  to  demonstrate  that 
the  restricted  growth  properties  of  the  block-form  model 


160 


growth  techniques  generally  lead  to  a  high  condition  number 
for  the  least  squares  matrix  A(i).   This  situation  can 
extend  to  the  extreme  point  of  ill-conditioning  where  these 
growth  techniques  fail  to  converge  on  an  adequate  model. 
The  Search  Indicator  Growth  Algorithm  allows  unrestricted 
model  growth,  is  robust  to  ill-conditioning,  and  typically 
finds  an  acceptable  model  when  block-form  techniques  fail. 

We  synthesize  the  following  nonlinear  system. 
y(n)  =  1.0  u(n)  +  .8  u(n-1)  +  .6  u(n-2)  -  .9  y(n-1) 

-  .7  y(n-2)  +  .4  u(n)u(n)  -  .2  u ( n- 1) u ( n- 1 ) y ( n-3 ) 

-  .1  y(n-1)y(n-2)y(n-3)  -  .12  u ( n ) y ( n-3 ) y ( n-3 )     {7.1} 
A  random  input  probe  { u ( n  )  ;  1  < =n<  =  200  }  is  generated  uniformly 
distributed  between  the  amplitude  limits  of  -2  and  +2.   The 
system  output  sequence  {y(n)}  is  produced  by  probing  the 
system  of  Eq.  {7.1}  with  the  input  sequence  {u(n)}. 
Starting  with  evaluation  of  the  base  model  BVM(1,1),  we 

recursively  grow  models  by  each  of  the  six  block-form  growth 

9 

techniques   of  Chapter  V  and  the  Search  Indicator  Growth 

Algorithm  of  Chapter  VI.   The  condition  number  and  error  fit 
for  each  model  are  evaluated  at  each  iteration,  and  the 
results  presented  in  Table  9-   We  also  include  the  results 


9   Both  the  "M  Directed"  and  "D  Directed"  growth 
algorithms  require  a  significance  test  for  switching  between 
their  two  phases  (See  Figure  21  and  the  discussion  in 
Chapter  V).   For  clarity  of  presentation,  we  assume  that 
there  is  a  test  that  recognizes  the  place  to  change  phases 
after  going  one  increment  too  far  (e.g.  we  turn  after  m=3  or 
d=3,  respectively).   The  tables  for  each  of  the  following 
experiments  show  where  these  phase  changes  are  made. 
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of  a  direct  least  squares  model  evaluation  using  the  exact 
form  of  the  system  as  a  comparison  basis.   Table  10  contains 
additional  details  of  the  more  successful  characterization 
by  the  Search  Indicator  Growth  Algorithm. 
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Experiment :  2     Iteration ;  1     Candidate  Model;  BVM(1,1) 

Number  of  candidate  model  terms,  q(i)  =  3 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  I( j  .  12)     I( j  .  1 3)   Related  to  Term 

1*      u(n)  .6583E+00 

2»      y(n-1)  .3040E-01 

3*      u(n-1)  .2155E-02 

Number  of  terms  in  final  subset  (marked  with  *),  N  r  3 

Total  number  of  terms  in  resulting  model,  c(i)  =  3 

Condition  Number  of  least  squares  matrix  A(i),  N  =  .1047E+02 

Square  root  of  the  fitting  error,  J(i)  =  .5183E+00 

Remarks:  We  chose  to  select  all  the  candidate  model  terras. 


Experiment :  2     Iteration :  2     Candidate  Model:  BVM(3,3) 

Number  of  candidate  model  terms,  q(i)  =  116 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  I( j  .  12)     I( j  ,  13)   Related  to  Term 

1»      u(n)u(n)  .1073E+00 

Number  of  terms  in  final  subset  (marked  with  •),  N  =  1 

Total  number  of  terms  in  resulting  model,  c(i)  =  4 

Condition  Number  of  least  squares  matrix  A(i),  N  =  .1125E+02 

c 

Square  root  of  the  fitting  error,  J(i)  =  .3941E+00 
Remarks:  This  term  had  an  I(j,12)  more  than  twice  as  large  as 
all  other  terras,  so  only  this  terra  was  selected. 


TABLE  1 0 :  Search  Indicator  Growth  Algorithm  results  of 
Experiment  2  (continued  on  next  page). 
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ExTJeriment 


Iteration;  3     Candidate  Model;  BVM(3,3) 


Number  of  candidate  model  terms,  q(i)  =  115 
Candidate  Model  Terms  After  First  Phase  Reduction; 


Term 


I  (  j  ,  1 2)     I  (  j  ,  13)   Related  to  Term 
.2945E-01   .9583E+00   u( n-2 )y ( n-2 )y ( n-2 ) 
.2701E-01   .7454E+00   u ( n-2 )y ( n-2 )y( n-2 ) 


1*   y(n-2)y(n-2)y(n-2) 
2*   y(n-2) 

3    u(n-2)y(n-2)y(n-2)   .2410E-01  .9583E+00  y ( n-2 ) y ( n-2 ) y ( n-2 ) 

4*   u(n-1 )u(n-1 )u(n-3)   .2237E-01  .1137E-01  u( n) u( n) y ( n-2 ) 

5*   y(n-1 )y(n-2)y(n-3)   .1975E-01  .8015E+00  u( n-2 )y ( n-1 )y ( n-3 ) 

6*   u(n-2)y(n-1 )y(n-3)   .1956E-01  .8015E+00  y ( n-1 )y( n-2 )y( n-3 ) 

.1895E-01  .5954E+00  y(n-2) 

.1822E-01  .6420E-01  u ( n-2 ) y ( n-2 ) y ( n-2 ) 


7*   u(n)u(n)y(n-2) 
8*   u(n-1 )u(n-1 ) 


Number  of  terms  in  final  subset  (marked  with  *),  N  =  7 
Total  number  of  terms  in  resulting  model,  c(i)  =  11 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .1311E+03 
Square  root  of  the  fitting  error,  J(i)  =  .2346E+00 
Remarks;  The  first  phase  picked  terms  with  l(j,12)  >  .17E+00 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.90 


TABLE  10:  (continued) 
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Experiment :  2     Iteration;  4     Candidate  Model ;  BVM(3,3) 
Number  of  candidate  model  terms,  q(i)  =  108 
Candidate  Model  Terms  After  First  Phase  Reduction: 


# 


Term 


I( j , 1 2)     l( j  ,  1 3)   Related  to  Term 
.8122E-02   .9072E+00   u( n) u( n-3 )y ( n-3 ) 


1*   u(n)y(n-3)y(n-3) 

2*   u(n-2)u(n-2)u(n-2)   .6511E-02  .8535E+00  u(n-2) 

.6317E-02  .8535E+00  u( n-2 ) u( n-2 )u( n-2 ) 

.5187E-02  .9072E+00  u( n) y ( n-3 )7( n-3  ) 


3*   u(n-2) 

4    u(n)u(n-3)y(n-2) 


5*   u(n-1 )u(n-1 )u(n-2)   .5140E-02   .6022E+00   u(n) 
Number  of  terms  in  final  subset  (marked  with  *),  N  =  4 
Total  number  of  terms  in  resulting  model,  c(i)  =  15 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .3709E+03 
Square  root  of  the  fitting  error,  J(i)  =  .8208E-01 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.50E-02 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.90 


TABLE  10:  (continued) 
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Sxperiment :  2     I teration ;  5     Candidate  Model :  BVM(3,3) 

Number  of  candidate  model  terms,  q(i)  *  108 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  l( j  ,  1 2)     l( j  ,  1 3)   Related  to  Ter 


m 


1*   u(n-1 )u(n-1 )y(n-3)   .9677E-03   .6222E+00   u ( n- 1 )y( n-1 )7( n-3 ) 
2*   u(n-1 )y(n-1 )7(n-3)   .5825E-03   .6222E+00   u( n- 1 )u( n-1 )y ( n-3 ) 
Number  of  terms  in  final  subset  (marked  with  *),  N  =  2 
Total  number  of  terms  in  resulting  model,  c(i)  =  17 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .6262E+03 
Square  root  of  the  fitting  error,  J(i)  =  .1037E-05 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.50E-03 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.90 

TABLE  10:  (continued) 


Table  9  indicates  that  two  of  the  block-form  techniques 
produced  excessively  ill-conditioned  least  squares  matrices 
and  were  unable  to  be  solved.   Matrix  G(i),  given  by  Eq . 
{4.31},  became  singular  and  this  stopped  the  evaluation. 
The  other  four  block-form  techniques  had  very  high  condition 
numbers  but  were  able  to  converge  on  the  BVM(3,3)  which 
subsumes  the  system  of  Eq .  |7.ll.   Each  case  produced  a 
considerable  number  of  unnecessary  model  terms  and  many  had 
significant  coefficient  values  (as  large  as  .IOE-OI).   It 
would  be  difficult  to  identify  these  terms  as  unnecessary 
without  having  knowledge  of  the  system  equation.   Both  the 
square  root  of  the  fitting  error  and  the  condition  number  of 
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the  least  squares  matrices  corresponding  to  the  models  from 
each  of  these  block  form  techniques  are  much  larger  than  the 
exact  model  case.   This  leads  us  to  question  the  value  of 
the  resulting  models. 

The  search  indicator  technique  did  not  suffer  from  these 
problems,  and  settled  on  the  model  equation  described  below. 
y(n)=  1.0000E+0  u(n)  +0.7999E+0  u(n-l)  +0.6000E+0  u(n-2) 

-0.8999E+0  7(n-l)  -0.7000E+0  y(n-2)  +0.4000E+0  u(n)u(n) 
-0.2000E+0  u(n-1 )u(n-1 )y(n-3)  -0.1000E+0  y ( n- 1 )y ( n-2 )y ( n-3 ) 
-0.1200E+0  u(n)y(n-3)y(n-3)  -0.2384E-6  u(n-l)u(n-l) 
+  0.1490E-6  u(n-1  )u(n-1  )u(n-1  )  -0.2962E-6  y ( n-2 ) y ( n-2 ) y ( n-2 ) 
-0.6985E-7  u(n)u(n)y(n-2)  -0.5765E-6  u ( n-2 )y ( n- 1 )y ( n-3 ) 
-0.3073E-6  u(n-1 )u(n-1 )u(n-2)  -0.1612E-5  u( n-2 )u( n-2 ) u( n-2 ) 
-0.1219E-6  u(n-1 )y(n-1 )y(n-3)  l7.2| 

It  is  obvious  that  we  can  ignore  the  terms  beyond  the  ninth 
term  in  Eq .  l7.2|.   Table  9  shows  that  the  square  root  of 
the  fitting  error  from  the  Search  Indicator  Growth  Algorithm 
was  better  than  three  orders  o  f  magni  tud  e  lowe  r  than  any 
error  obtained  by  the  block  form  techniques.   The  condition 
number  and  fitting  error  produced  by  this  algorithm  are 
realistically  close  to  the  values  produced  by  direct 
analysis  of  the  exact  model.   The  square  root  of  the  fitting 
error  for  the  exact  model  was  not  exactly  zero,  which 

indicates  that  some  numerical  roundoff  error  existed  in  the 

2 

computer  program.   We  actually  computed  J  (i)  and  then  took 

the  square  root.   The  non-zero  value  for  the  square  root  of 
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the  exact  model  fitting  error  J(i)  =  .4176E-6  translates  to 

2 

a  J  (i)  of  .1744E-12,  which  is  within  the  expected  numerical 

range  of  zero  for  the  computer. 

Table  10  shows  the  operation  of  the  Search  Indicator 
Growth  Algorithm.   Notice  how  rapidly  this  technique 
selected  the  critical  model  terms.   The  line  titled 
"Remarks"  gives  a  summary  of  the  heuristic  decision  making 
rules  used  for  acceptance  of  the  particular  candidate  model 
terms  in  each  phase  of  the  algorithm. 

This  experiment  demonstrated  the  weakness  of  the  block 
form  techniques  resulting  from  their  restricted  form  of 
model  growth.   It  is  logical  to  expect  that  as  one 
arbitrarily  adds  more  and  more  sets  of  model  terms,  the 
probability  increases  that  two  or  more  terms  will  be  nearly 
linearly  dependent  (colinear).   This  would  result  in  a  large 
increase  in  the  condition  number  of  the  least  squares  matrix 
A(i).   This  conjecture  was  also  tested  by  evaluating  search 
indicator  l(j,13)  for  all  of  the  terms  added  at  each  growth 
iteration  by  the  block  form  techniques.   In  all  of  these 
cases,  there  were  numerous  occurrences  of  l(j,13)  values 
greater  than  0.90,  and  this  appears  to  explain  the  observed 
ill-conditioning.   Any  growth  techniques  that  do  not  check 
for  and  somehow  handle  colinearity  among  the  model  terms 
will  have  similar  problems  in  characterizing  systems.   The 
Search  Indicator  Growth  Algorithm  effectively  handles  this 
problem  with  search  indicator  l(j,13). 
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Experiment  3 

This  experiment  examines  the  performance  of  the  various 
growth  techniques  when  the  system  under  test  actually  has  a 
significant  delay  factor  L  (previously  discussed  in  Chapter 
VI).   The  block-form  techniques  do  not  have  any  provision 
for  recognizing  this  condition  during  the  growth  iterations, 
and  therefore  include  unnecessary  model  terras. 

We  synthesize  the  following  nonlinear  system. 
y(n)  =  1.0  u(n-4)  +.8  u(n-5)  -.4y(n-1)  +.15  u(n-5)y(n-2)  {7.3) 
Using  the  same  input  sequence  (length  N  =  200)  as  Experiment 
2,  we  probe  Eq  .  {7.3)  to  produce  the  system  output  sequence 
{y(n)}.   We  grow  models  by  the  M  Directed,  D  Directed, 
Neighbor,  and  Search  Indicator  techniques.   The  other  three 
block-form  techniques  would  require  more  than  the  available 
200  measurements  to  evaluate  a  BVM(2,5)  model,  and  it  was 
decided  not  to  include  them. 

We  started  the  Search  Indicator  Growth  Algorithm  by 
initially  considering  the  candidate  terms  in  BVM(1,9),  the 
highest  memory  linear  model  that  could  be  handled  by  the 
computer  program.   The  largest  value  of  I(j,12)  was  used  to 
specify  which  terra  to  include  in  the  first  model.   Using  the 
degree  and  memory  of  this  first  selected  term,  we 
heur i s t i cal ly  consider  the  candidate  set  specified  by  the 
BVM  with  one  increase  in  degree  and  one  increase  in  memory. 
The  condition  number  and  error  fit  for  each  model  are 
evaluated  at  each  iteration,  and  the  results  are  presented 
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in  Table  11.   Table  12  contains  the  full  details  of  the  more 
compact  characterization  by  the  Search  Indicator  Growth 
Algorithm . 
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Experiment:  3     Iteration :  1     Candidate  Model  ;  BVM(1,9) 

Number  of  candidate  model  terms,  q(i)  =  19 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  I( j . 12)     I( j , 1 3)   Related  to  Term 

1» 


u(n-U) 


.  1409E  +  01 


Number  of  terras  in  final  subset  (marked  with  *),  N  r  1 

Total  number  of  terms  in  resulting  model,  c(i)  =  1 

Condition  Number  of  least  squares  matrix  A(i),  Nr  .1000E+01 

^  c 

Square  root  of  the  fitting  error,  J(i)  =  .5803E+00 
Remarks:   We  picked  the  one  candidate  model  terra  from 
the  candidate  set  with  the  highest  I(j,12). 


Experiment :  3     Iteration  :  2     Candidate  Model :  BVM(2,5) 
Number  of  candidate  model  terras,  q(i)  =  76 
Candidate  Model  Terms  After  First  Phase  Reduction: 
#       Term  I( j  .  12)     I( j  .  13)   Related  to  Term 

1*      u(n-5)  .2278E+00   .7982E+00      y(n-1) 

2*      y(n-1)  .1029E+00   .7982E+00      u(n-5) 

Number  of  terms  in  final  subset  (marked  with  *),  N  =  2 
Total  number  of  terras  in  resulting  model,  c(i)  =  3 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .1802E+02 
Square  root  of  the  fitting  error,  J(i)  =  .2316E+00 
Remarks:  The  first  phase  picked  terms  with  I(j,12)  >  0.70E-01 
and  the  second  phase  kept  terms  with  I(j,13)  <  0.90 


Table  1 2 :  Search  Indicator  Growth  Algorithm  results  of 
Experiment  3  (continued  on  next  page). 
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Experiment :  3  Iteration:  3     Candidate  Model :  BVM(2,5) 

Number  of  candidate  model  terms,  q(i)  =  74 
Candidate  Model  Terms  After  First  Phase  Reduction: 
_#       Term  l(  j,  1  2)     l(  j  ,  13)   Related  to  Te 

1*      u(n-5)y(n-2)      .5363E-01   .8158E+00   7(n-l)y(n-2) 
2*      7(n-l)y(n-2)      .4383E-01   .8158E+00   u(n-5)7(n-2) 
Number  of  terms  in  final  subset  (marked  with  *),  N  =  2 
Total  number  of  terms  in  resulting  model,  c(i)  =  5 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .3206E+02 
Square  root  of  the  fitting  error,  J(i)  =  .6653E-06 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.10E-01 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.90 


TABLE  1 1 :  (continued) 

This  experiment  shows  that  the  Search  Indicator  Growth 
Algorithm  can  provide  a  better  conditioned  solution  (over  2 
orders  of  magnitude  lower)  than  the  other  growth  techniques 
when  the  system  has  a  significant  delay  factor  L.   The 
block-form  techniques  used  in  this  experiment  converged  on  a 
larger  model  with  reasonably  small  fitting  error.   These 
solutions  however  had  higher  condition  numbers  and  required 
a  significantly  larger  number  of  multiplicative  operations. 

The  search  indicator  algorithm  converged  on  the 
following  model  equation; 
y(n)  =  1.0000E+0  u(n-4)  +.8000E+0  u(n-5)  -.4000E+00  y(n-l) 

+.1897E-7  u(n-l)u(n-2)  +.1500E+0  u(n-5)y(n-2)     |7.4l 
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We  experimentally  developed  the  technique  used  to 
specify  the  subsequent  sets  of  candidate  model  terms  based 
on  the  terras  selected  in  the  first  iteration.   It  is  denoted 
as  the  Candidate  Model  Specification  Technique  in  the  work 
that  follows.   This  heuristic  technique  works  well  but  it  is 
acknowledged  that  there  undoubtedly  are  cases  where  it  may 
fail  to  specify  a  suitably  inclusive  set  of  candidate  model 
terms.   The  resulting  model  may  be  suboptiraal  in  these 
cases,  and  other  candidate  model  term  specification 
techniques  need  be  considered. 

The  major  strength  of  the  Search  Indicator  Growth 
Algorithm  is  its  ability  to  efficiently  select  the  best 
performing  model  terms  from  the  candidate  set.   It  is 
important  to  insure  that  the  candidate  set  is  large  enough. 
There  is  no  known  way  to  guarantee  ahead  of  time  that  this 
goal  is  met.   It  remains  necessary  for  the  user  of  this 
algorithm  to  recognize  when,  and  if,  the  candidate  set  is  to 
be  expanded. 
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Ex  per imen t  4 

The  purpose  of  this  fourth  experiment  is  to  show  that 
even  for  linear  systems,  the  Search  Indicator  Growth 
Algorithm  can  provide  more  efficient  systems 
characterization  than  the  widely  used  recursive-in-order 
techniques  like  those  of  Box  and  Jenkins  [Ref.  17].   This  is 
a  simplified  example  of  what  can  also  happen  when  block-form 
techniques  are  used  on  more  general  nonlinear  systems. 

Consider  the  following  linear  system  equation. 
y(n)  =  1.0  u(n)  +.5  u(n-3)  +.3  u(n-8)  -.6  y(n-3)  -.4  y(n-7)  (7.51 
Using  the  same  input  sequence  (length  N  =  200)  as  Experiment 
2,  we  probe  Eq  .  {7.5}  with  {u(n)}  to  produce  the  system 
output  sequence  {y(n)}.   We  then  grow  models  by  the  M 
Directed  Growth  technique  (with  d=1)  and  the  Search 
Indicator  Growth  Algorithm.   Fixing  the  degree  at  d=1 
reduces  the  M  Directed  technique  to  an  equivalent  form  of 
the  Box  and  Jenkins  technique.   It  is  obvious  that  the  other 
block  form  techniques  would  add  many  unneeded  nonlinear 
terras,  and  they  are  therefore  not  considered  here. 

The  condition  number  and  error  fit  for  each  model  are 
evaluated  at  each  iteration,  and  the  results  are  presented 
in  Table  13.   We  also  include  the  results  of  a  direct  least 
squares  model  evaluation  using  the  exact  form  of  the  system 
as  a  comparison  basis.   Additional  details  of  the 
characterization  by  the  Search  Indicator  Growth  Algorithm 
are  presented  in  Table  14. 
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Sunnarj  Results  from  Experiment  4 


Sxperiaent :  4-     Iteration:  1     Candidate  Model:  BVM(1,9) 


Number  of  candidate  model  terms,  q-(i)  =  19 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#^      Term       '         I  (  j  ,  1  2)     I  (  j  ,  1  3)   Related  to  Tern 

1  * 


u(n) 


.  1  1 63E+01 


Number  of  terms  in  final  subset  (marked  with  *),  N  =  1 
Total  number  of  terms  in  resulting  model,  c(i)  =  1 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .1000E+01 
Square  root  of  the  fitting  error,  J(i)  =  .1254E+01 
Remarks:   We  picked  the  one  candidate  model  term  from 
the  candidate  set  with  the  highest  l(j,12). 


'able  1 4 :  Search  Indicator  Growth  Algorithm  results  of 
Experiment  4         (continued  on  next  page). 
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Experiment :  4     I teration ;  2     Candidate  Model;  BVM(1,9) 


Number  of  candidate  model  terms,  q(i)  =  18 
Candidate  Model  Terms  After  First  Phase  Reduction: 
£       Term  l( j,  1 2)     l( j,  13)   Related  to  Term 

1»      y(n-7)  .1030E+01   .1288E+00      y(n-3) 

2*  y(n-3)  .6925E+00   .1288E+00      y(n-7) 

Number  of  terms  in  final  subset  (marked  with  *),  N  =  2 
Total  number  of  terms  in  resulting  model,  c(i)  =  3 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .2523E+01 
Square  root  of  the  fitting  error,  J(i)  =  .5393E+00 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.60E+00 
and  there  was  no  required  reduction  in  phase  2. 


Experiment :  4     Iteration:  3     Candidate  Model :  BVM(1 ,9) 


Number  of  candidate  model  terms,  q(i)  =  16 
Candidate  Model  Terms  After  First  Phase  Reduction: 
#       Term  I ( j  ,  1 2)     I ( j  ,  1 3)   Related  to  Term 

1»      u(n-8)  .1272E-t-00   .3897E+00      y(n-8) 

2*      y(n-8)  .1088E+00   .3897E+00      u(n-8) 

3*      u(n-3)  .7227E-01   .8644E-02      y(n-8) 

Number  of  terms  in  final  subset  (marked  with  *),  N  =  3 
Total  number  of  Therms  in  resulting  model,  c(i)  =  6 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .1378E+02 
Square  root  of  Che  fitting  error,  J(i)  =  .5186E-06 
Remarks:  The  first  phase  picked  -cerms  with  l(j,12)  >  0.50E-01 
and  there  was  no  required  reduction  in  phase  2. 


TABLE  14:  (continued) 
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This  experiment  shows  that  the  Search  Indicator  Growth 
Algorithm  can  converge  on  an  accurate  model  of  a  linear 
system  in  fewer  iterations  than  the  M  Directed  Growth 
technique.   The  condition  number  of  the  least  squares  matrix 
A(i)  is  significantly  lower,  and  therefore  the  variance  of 
the  model  coefficients  is  lower  when  the  Search  Indicator 
technique  is  used.   We  also  achieved  a  lower  error  and  found 
that  there  was  little  dependency  among  the  final  model 
terras.   The  following  model  was  obtained  with  the  Search 
Indicator  Growth  Algorithm. 
y(n)  =  1.0000E+1  u(n)  -.6000E+0  y(n-3)  -.4000E+0  y(n-7) 

-.5000E+0  u(n-3)  +.3000E+0  u(n-8)  -.3847E-7  y(n-8)   (7.6} 

The  main  reason  these  results  were  obtained,  is  the 
particular  form  of  Eq.  {7.5}.   The  M  Directed  technique 
could  not  take  advantage  of  the  fact  that  there  were 
unnecessary  terms  in  a  full  BVM(1,8)  form,  and  consequently 
had  to  include  all  17  of  the  terms.   Tables  13  and  14  show 
how  the  Search  Indicator  Growth  Algorithm  efficiently 
converged  on  an  adequate  model,  based  on  performance 
evaluation  of  the  set  of  candidate  model  terms. 


11   A  conventional  growth  stopping  criterion  in  the 
literature  [Ref.  17]  is  when  the  fitting  error  J  stops 
decreasing  significantly.   In  this  example,  the  M  Directed 
growth  algorithm  (with  d=1)  could  therefore  indicate  that 
growth  should  stop  at  iteration  4;  this  could  result  in  an 
inferior  model . 
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Experiment  5 
This  experiment  shows  how  the  finite  length  of  the 
measurement  sequences  prevents  the  block-form  growth 
techniques  from  converging  to  an  accurate  model.   Chapter  VI 
described  how  these  block-form  techniques  could  not  be  used 
to  evaluate  models  with  more  terms  than  the  number  of  data 
measurements  in  the  sequences.   The  net  effect  is  that  only 
a  limited  set  of  model  terras  can  be  considered,  based  on  the 
number  of  available  measurements.   The  Search  Indicator 
Growth  Algorithm  is  shown  to  be  unaffected  by  the  size  of 
the  data  sequences,  and  is  capable  of  considering  a  nearly 
unlimited  number  of  candidate  model  terras.   The  ability  to 
efficiently  evaluate  a  very  large  set  of  candidate  model 
terms,  and  cut  down  to  a  small  and  meaningful  subset,  is  one 
of  the  main  strengths  of  the  algorithm. 

We  synthesize  the  following  nonlinear  system. 
y(n)  =  1.0  u(n)  +.8  u(n-1)  +.6  u(n-2)  +.45  u(n-3) 

-.9  y(n-1)  -.7  y(n-2)  -.25  y(n-3)  +.1  u ( n ) u ( n- 1 ) u ( n-2 ) 
-.15  y(n-1 )y(n-2)y(n-3)  -.35  u(n-2)y(n-3) 
+.05  y(n-1)y(n-2)  -.18  y ( n-2 ) y ( n-2 ) y ( n-3 )         (7.7} 
We  use  a  random  input  probe  (length  N  =  100)  uniformly 
distributed  between  the  limits  of  -1  and  +1.   The  system 
output  sequence  {y(n)}  is  produced  by  probing  the  system  of 
Eq.  {7.7}  with  the  input  sequence  {u(n)}.   Starting  with  the 
evaluation  of  the  base  model  BVM(1,1),  we  recursively  grow 
the  model  by  each  of  the  six  block-form  growth  techniques  of 
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Chapter  V  and  the  Search  Indicator  Growth  Algorithm  of 
Chapter  VI.   Whenever  a  growth  technique  reaches  the  point 
where  insufficient  data  measurements  are  available,  we  stop 
the  growth.   The  Candidate  Model  Specification  Technique  is 
used  for  the  search  indicator  growth,  starting  with  the 
initial  model  BVM(1,9). 

The  condition  number  and  fitting  error  for  each  model 
are  evaluated  at  each  iteration,  and  the  results  are 
presented  in  Table  15.   We  include  the  results  of  a  direct 
least  squares  model  evaluation  using  the  exact  form  of  the 
system  as  a  comparison  basis.   Table  16  contains  additional 
details  of  the  more  successful  characterization  by  the 
Search  Indicator  Growth  Algorithm. 
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Experiment :  5     Iteration ;  1     Candidate  Model;  BVM(1,9) 

Number  of  candidate  model  terms,  q(i)  =  2 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  I( j  ,  12)     I( j  .  1  3)   Related  to  Term 

1»      u(n)  .3986E+00 

2*      y(n-2)  .2369E-01 

Number  of  terms  in  final  subset  (marked  with  *),  N  =  2 

Total  number  of  terms  in  resulting  model,  c(i)  =  2 

Condition  Number  of  least  squares  matrix  A(i),  N  =  .1686E+01 

Square  root  of  the  fitting  error,  J(i)  =  .2878E+00 

Remarks:  These  candidate  model  terras  has  values  of  I(j,12) 

that  were  far  greater  than  those  of  the  other  terms 


TABLE  1 6 :  Search  Indicator  Growth  Algorithm  results  of 
Experiment  5  (continued  on  next  page). 
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Experiment:  5     Iteration :  2     Candidate  Model ;  BVM(3,3) 
Number  of  candidate  model  terms,  q(i)  =  117 
Candidate  Model  Terms  After  First  Phase  Reduction: 


Term 


I( j  ,  1 2)     I( j  ,  13)   Related  to  Ter 


m 


1*  u(n-1 )u(n-3)y(n-2)  .2714E-01  .8895E+00  u(n- 

2»  u(n-2)y(n-3)  .2600E-01  .1002E+00  u(n- 

3*  u(n-1 )u(n-2)u(n-3)  .2520E-01  .8434E+00  u(n- 

4  u(n-1 )y(n-2)y(n-3)  .2484E-01  .8895E+00  u(n- 

5*  u(n-2)u(n-2)y(n-1 )  .2339E-01  . 1 927E+00  u(n- 


)y(n-2)y(n-3) 
)u(n-2)u(n-3) 
)u(n-3)y(n-2) 
)u(n-3)y(n-2) 
)u(n-2)u(n-3) 


Number  of  terms  in  final  subset  (marked  with  *),  N  =  4 
Total  number  of  terms  in  resulting  model,  c(i)  =  6 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .1037E+03 
Square  root  of  the  fitting  error,  J(i)  =  .1667E+00 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  .20E-01 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.85 

TABLE  16:  (continued) 
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Sxperimen t:  5     Iteration:  3 


Candidate  Model;  BVM(3,3) 


Number  of  candidate  model  terms,  q(i)  =  113 
Candidate  Model  Terms  After  First  Phase  Reduction: 


Term 


I( j , 12)     I( j, 13)   Related  to  Term 
.5916E-02   .8253E+00   u(n-3) 
.4575E-02   .8253E+00   y(n-3) 


1»   y(n-3) 
2*      u(n-3) 

;3»   u(n-1  )u(n-1  )y(n-3)   .4070E-02   .6100E+00   y(n-3) 
I 

4*   u(n)u(n)y(n-3)       .3733E-02   .5592E+00   y(n-3) 
Number  of  terms  in  final  subset  (marked  with  *),  N  =  4 
"Total  number  of  terms  in  resulting  model,  c(i)  *  10 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .2329E+03 
Square  root  of  the  fitting  error,  J(i)  =  .1438E+00 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.3200E 
and  the  second  phase  kept  terras  with  l(j,13)  <  0.85 
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TABLE  16:  ( continued) 
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Experiment :  5     Iteration;  4     Candidate  Model:  BVM(3,3) 


Number  of  candidate  model  terms,  q(i)  =  109 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  l( j  ,  1 2)     l( j  ,  1 3)   Related  to  Term 

1»   u(n-l)y(n-3)         .3760E-02   .8650E+00   u(n-l)u(n-3) 

2    u(n-l)u(n-3)         .2548E-02   .8650E+00   u(n-l)y(n-3) 

Number  of  terms  in  final  subset  (marked  with  *),  N  =  1 

f 

Total  number  of  terms  in  resulting  model,  c(i)  =  11 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .2343E+03 
Square  root  of  the  fitting  error,  J(i)  =  .1287E+00 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.2000E-02 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.85 


TABLE  16:  (continued) 
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Experiment ;  5     Iteration;  5     Candidate  Model;  BVM(3,3) 
Number  of  candidate  model  terms,  q(i)  =  108 
Candidate  Model  Terms  After  First  Phase  Reduction: 


# 


Term 


1*   u(n)u(n-3)7(n-1  ) 
2    u(n)7(n-1 )y(n-3) 


3» 

4* 
5* 
6* 
7* 
8* 
9 


I(  j  ,  1 2)  I(  j  ,  1 3)   Related  to  Term 

.1194E-02  .8799E+00   u ( n) y ( n-1 )y ( n-3 ) 

.1081E-02  .3799E+00   u( n) u( n-3 )y ( n-1 ) 

.8264E-03  .8426E+00   u( n) u( n-2 )y( n-3 ) 


u(n)u(n-2)u( n-3 ) 

y(n-2)y(n-2)y(n-3)  .7172E-03  .6808E+00  u( n-3 )y ( n-2 )y ( n-2 ) 

y(n-l)  .6958E-03  .6116E+00  u ( n) u( n) y ( n- 1 ) 

u(n-3)y(n-2)y(n-2)  .6735E-03  .6808E+00  y ( n-2 )y ( n-2 )y ( n-3 ) 

u(n)u(n)y(n-1  )  .6681E-03  .6116E+00  y(n-l) 

u(n-3)y(n-1  )y(n-3)  .6619E-03  .9303E+00  y ( n-1 )y( n-3 )y ( n-3 ) j 

y(n-1 )y(n-3)y(n-3)  .6322E-03  .9303E+00  u( n-3 )y( n-1 )y( n-3 ) j 


10*   u(n)u(n-2)y(n-3)    .6199E-03   .8426E+00   u( n) u( n-2 ) u( n-3 ) 
Number  of  terms  in  final  subset  (marked  vfi  th  *),  N  =  8 
Total  number  of  terms  in  resulting  model,  c(i)  =  19 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .6544E+03 
Square  root  of  the  fitting  error,  J(i)  =  .1006E+00 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  .6000E-03 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.85 


TABLE  1 6;  (continued) 
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Bxperiment ;  5     I tera  tion :  6     Candidate  Model ;  BVM(3,3) 
Number  of  candidate  model  terms,  q(i)  =  100 
Candidate  Model  Terms  After  First  Phase  Reduction: 
#       Term  l(  j  ,  1 2)     l(  j  ,  1 3)   Related  to  Ter 

1*   u(n)u(n-3)  .4679E-03   .4053E-01   u(n-3)7(n-l) 

2*   u(n-3)7(n-l)         .4498E-03   .4053E-01   u(n)u(n-3) 
3*   u(n-3)u(n-3)         .4242E-03   .3640E-01   u(n)u(n-3) 
Number  of  terms  in  final  subset  (marked  with  * ) ,  N  =  3 
Total  number  of  terms  in  resulting  model,  c(i)  =  22 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .6858E+03 
Square  root  of  the  fitting  error,  J(i)  =  .8167E-01 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.33E-03 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.85 


TABLE  16:  (continued) 
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Experiment ;  5     Iteration :  7     Candidate  Model:  BVM(3,3) 


Number  of  candidate  model  terms,  q(i)  =  97 
Candidate  Model  Terms  After  First  Phase  Reduction: 


T"  p  f 


erm 


I ( j , 1 3)   Related  to  T 


1*   u(n-1 )u(n-1 )u(n-1 )   .1280E-03   .8497E+00   u(n-l) 


erra 


2* 

3* 
4* 

I 
1 

I7. 


u(n-1 )u(n-3)u(n-3)   .1189E-03  .5654E+00  u(n-l) 

7(n-l)y(n-2)  .1173E-03  .7726E+00  u(n-2)7(n-l) 

.1002E-03  .6958E-01  u(n-2)y(n-l) 

.9235E-04  .7726E+00  y(n-l)y(n-2) 

.9088E-04  .3713E-01  u( n-1 ) u( n-3 ) u( n-3 ) 


u(n)u(n-1 )y(n-2) 
u(n-2)y(n-1 ) 
u(n-2)u(n-2) 
u(n-1 ) 


.902IE-O4   .8497E+00   u(n-1 )u(n-1 )u(n-1 ) 


jNumber  of  terms  in  final  subset  (marked  with  *),  N  =  7 

I  ^ 

jTotal  number  of  terms  in  resulting  model,  c(i)  =  29 

Condition  Number  of  least  squares  matrix  A(i),  N  =  .1137E+04 

Square  ^oot  of  the  fitting  error,  J(i)  =  .6570E-01 

Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.90E-04 

and  the  second  phase  kept  terms  with  l(j,13)  <  0.85 


TABLE  1 6 :  (continued) 
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Sxperiment :  5     Iteration;  8     Candidate  Model :  BVM(3,3) 
Number  of  candidate  model  terms,  q(i)  =  90 
Candidate  Model  Terms  After  First  Phase  Reduction: 
#       Term  l( j  ,  1 2)     l(  j  ,  13)   Related  to  Term 

1*   u(n)y(n-l)  .1754E-03   .8446E-t-00   u(n)u(n-l) 

2*      u(n-2)  .1292E-03   .1742E-01   u(n)u(n-l) 

3*   u(n)u(n-l)  .1242E-03   .8466E+00   u(n)y(n-l) 

Number  of  terms  in  final  subset  (marked  with  *),  N  =  3 
Total  number  of  -cerms  in  resulting  model,  c(i)  =  32 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .1303E+04 
Square  root  of  the  fitting  error,  J(i)  =  .1429E-01 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.105E-03 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.85 


Experimen  z :  5     1 1 eration ;  9     Candidate  Model :  3VM(3,3) 

Number  of  candidate  model  terms,  q(i)  =  87 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  I  ( j  ,  1 2)     I  ( j  ,  1 3)   Related  to  Term 

1*   y(n-1 )y(n-2)y(n-3)   .1032E-04 

Number  of  terms  in  final  subset  (marked  with  *),  N  =  1 

Total  number  of  terms  in  resulting  model,  c(i)  =  33 

Condition  Number  of  least  squares  matrix  A(i),  N  =  .1405E+04 

Square  root  of  the  fitting  error,  J(i)  =  .6545E-02 

Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.60E-05 


TA3LS  16:  (continued) 
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Sxperiaent :  5     Iteration;  10    Candidate  Model ;  BVM(3,3) 

Number  of  candidate  model  terms,  q(i)  =  86 

Candidate  Model  Terms  After  First  Phase  Reduction: 

#       Term  l( j,  1 2)     l( j  ,  1 3)   Related  to  Term 


1*   u(n)u(n-2)y(n-1 )     .3232E-05   .8468E+00   u ( n) u( n-1 ) u( n-2 ) 
2*   u(n)u(n-1 )u(n-2)     .2892E-05   .7467E+00   u( n) u( n-2 )y( n-1 ) 
Number  of  terms  in  final  subset  (marked  with  *),  N  =  2 
Total  number  of  terms  in  resulting  model,  c(i)  =  35 
Condition  Number  of  least  squares  matrix  A(i),  N  =  .1966E+04 
Square  root  of  the  fitting  error,  J(i)  =  .3882E-05 
Remarks:  The  first  phase  picked  terms  with  l(j,12)  >  0.15E-06 
and  the  second  phase  kept  terms  with  l(j,13)  <  0.85 


TABLE  1 6 :  (continued) 

Table  15  shows  that  the  first  six  growth  techniques  all 
failed  to  converge  on  an  adequate  model  because  they 
exhausted  the  available  data.   The  Neighbor  Growth  technique 
came  closest  to  generating  an  adequate  model,  but  it  also 
had  to  restrict  its  model  growth  choices. 

Only  the  Search  Indicator  Growth  Algorithm  found  an 
acceptable  model.   Note  that  the  condition  number  of  this 
last  model  (iteration  10)  is  reasonably  close  to  that  of  the 
exact  model  of  the  system,  despite  the  fact  that  we  have 
nearly  three  times  the  required  number  of  model  terms. 

The  Search  Indicator  Growth  Algorithm  considered  a  total 
of  131  different  model  terms  specified  by  our  selection 
technique,  even  though  there  were  only  100  data  points. 
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This  ability  enabled  it  to  locate  the  best  model  terms  to 
accept  over  the  ten  growth  iterations.   There  were  a  number 
of  iterations  where  near  colinearity  was  detected,  and  Table 
16  shows  how  the  corresponding  term  with  the  lowest  value  of 
I(j,12)  was  deleted  from  consideration.   This  last  model 
contains  23  extra  model  terms,  but  these  unnecessary  terras 
are  easily  identifiable  by  their  low  coefficient  estimates. 
The  equation  obtained  for  this  model  is; 

y(n)  r  1.0000E+1  u(n)  -.7000E+0  y(n-2)  -.3500E+0  u(n-2)y(n-3) 

u(n-2)u(n-3)  -.6385E-5  u(n-2)y(n-1) 
u(n-2)y(n-1)  +.4500E+0  u(n-3) 
-.1266E-4  u(n-1 )u(n-3)y(n-2) 
n)y(n-3)  -.1116E-4  u ( n- 1 ) u ( n- 1 ) u ( n- 1 ) 

+.2841E-4  u(n-l )u(n-3)u(n-3) 
u(n-1)y(n-3)  -.1U89E-4  u(n-1)y(n-3) 
n-2)u(n-3)  -.1800E+0  y ( n-2 ) y ( n-2 ) y ( n-3 ) 
n)y(n-l)  +.2956E-4  u ( n ) u ( n-2 ) y ( n-3 ) 
n_3)y(n-1)  -.2190E-4  u ( n-3 ) y ( n- 1 ) y ( n-3 ) 
y(n-2)y(n-2)  +.9621E-6  u(n)u(n-3) 
u(n-3)  +.1190E-4  u(n-3)y(n-1) 
-.1500E+0  y(n-1 )y(n-2)y(n-3) 
u(n-2)  +.5000E-1  y(n-1)y(n-2) 
n_l)  _.9998E-1  u ( n ) u ( n- 1  ) u ( n-2 ) 
n-1)y(n-2)  +.6000E+0  u(n-2) 
n-1)  +.2016E-4  u ( n ) u ( n-2 ) y ( n- 1 )    (7.8} 
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Experiment  6 

This  experiment  investigates  the  degraded  model  growth 
resulting  from  additive  output  noise,  and  the  improvement 
that  can  result  when  we  have  control  over  the  input 
sequence.   Chapter  6  discussed  a  method  that  uses  repeated 
application  of  the  identical  system  input,  and  calculates  a 
poin t-f or-poin t  ensemble  average  of  the  corresponding  sets 
of  system  outputs  to  form  an  "averaged"  system  output 
sequence  {y(n)}.   Model  growth  is  then  attempted  using  the 
input  sequence  {u(n)}  and  this  averaged  sequence  {y(n)}. 

We  synthesize  the  following  nonlinear  system. 
y(n)  =  1.0  u(n)  +.  .8  u(n-1)  -  .4  y(n-1) 

+  . 15  u(n-1 )y(n-2)  +  v(n)  {7.9) 

We  generate  a  random  input  sequence  { u ( n )  ;  1< =n<= 1 00 } 
uniformly  distributed  between  the  amplitude  limits  (-2,2), 
and  a  random  additive  noise  sequence  { v ( n  )  ;  1  <  =  n< = 1 00 } 
uniformly  distributed  between  the  amplitude  limits  (-1,1). 
The  sequence  {v(n)}  is  produced  with  a  different  random  seed 
and  is  uncorrelated  with  the  input.   We  produce  the  noisy 
system  output  from  Eq  .  {7.9}  and  grow  models  by  the  Search 
Indicator  Growth  Algorithm.   Growth  is  halted  when  the 
fitting  error  stops  decreasing  significantly,  or  when  the 
condition  number  jumps  drastically.   These  results  are 
summarized  in  the  first  section  of  Table  17. 

After  reapplying  the  input  probe  to  the  system  a  number 
of  times,  an  ensemble  average  of  the  corresponding  system 
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output  sequences  is  performed,  and  we  form  the  "averaged" 
system  output  sequence  |y(n)].   Various  experiments  are 
conducted  vith  an  increasing  number  of  output  sequences 
(ensemble  members)  used  to  produce  |y(n)j.   These  results 
are  included  in  Table  17  for  4,  10,  40  and  100  ensemble 
member  averages.   A  direct  least  squares  evaluation  of  the 
model,  with  the  exact  form  of  the  system  and  no  measurement 
noise,  is  included  for  comparison  purposes. 


ITS 
METHOD  NAME 

l.KZ 

lO.V 

MODEL 

NSV 
TERMS 

'  3 
13 
1  6 
1  4. 
12 
1  1 

SELECTED 
TERMS 

2 
2 
2 
2 

1 

0 

TOTAL 
TERMS 

2 
4 

6 

3 

9 

15 

CONDITION 
NUMBER 

.  1307E*01 
.32733*01 
.  1 5533*02 
. 47033*02 
.4740E*02 
.37043*03 

SQUARE  ROOT 
OF  FITTING 
ERROR;  J 

SEARCH 
INDICATOR 
(1  AVERAGE) 

(See  Eq.  I^.'Ol) 

2 

4 
5 
6 

3VM(1 ,9) 
3VM(2,2) 
3VM(2,2) 
3VM(2,2) 
3VM(2,2) 
3VM(2,2) 

.•:'01  4E*00 
. 5950E*00 
. 55503*00 
. 53923*00 
.53703*00 
.  52213*00 

SEARCH 

INDICATOR 

(4  AVERAGES) 

(See  Eq.  [''.Ill) 

1 
2 

T 

4 
5 
6 

3VM(1 ,9) 
BVM(2,2) 
3VM(2,2) 
BVM(2,2) 
3VM(2,2) 
3VX(2,2) 

19 
1  a 

1  6 
15 
14 
13 

2 
2 

♦  1 
1 
1 
2 

2 

i. 
5 
6 

7 

9 

.  13073*01 
. 17453*02 
.  1''66E*02 
.  1846E*02 
.2340E*02 
.36723*03 

. 3"84E*00 
.26713*00 
.  1 9553*00 
.  1332E*00 
.  1  671 3+00 
.141 93*00 

SSARCH 

INDICATOR 

(10  AVERAGES) 

(See  Eq.  i7.12l) 

1 
2 

3 
4 

3VM(1 ,9) 
3VM(2.2) 
3  VM ( 2 , 2  ) 
3VM(2,2) 

19 

1  3 
1  " 
1  6 

2 
1 

1 
4 

2 

T 

4 
8 

.  13073*01 
. 24433*01 
.26133*01 
.661 53*03 

.34923*00 
.2235E-*00 
. 1 448E*00 
. 59563-01 

SEARCH  1 

INDICATOR  2 

'40  AVERAGES)  3 

(See  Ea .  '7.13!)  4 


3VM(1,9)  ;9 

3VM(2,2)  13 

3V«(2,2)  17 

3VM(2,2)  16 


SEARCH  1  3VM( 1,9)  '9 

INDICATOR  2  37M(2,2)  18 

(100  AVERAGES)  3  3VM(2,2)  17 

(See  Is.  i7,ut)  4  3VM(2,2)  16 


:XACT  'NO  NOISE)  SYSTEM  MODEL 


13073*01 
2443E*01 
261 eE*00 
5214E*03 


13073*01 
24443*01 
261 93*01 
51 903*03 


25973*02 


3^223*00 
21 963*00 
1 3503*00 
1 61 73-01 


341 53*00 
21 395*00 
1  3463+00 
54593-02 


1  2083-05 
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This  experiment  shows  how  control  over  the  system  input 
can  be  used  to  reduce  the  distorting  effect  of  additive 
output  noise  on  system  characterization.   Both  the  fitting 
error  and  the  condition  number  are  reduced  as  we  average 
more  data  sequences.   The  Search  Indicator  Growth  Algorithm 
converges  quicker  and  on  a  more  compact  model.   The 
averaging  technique  reduces  the  effect  of  the  output  noise 
by  a  factor  equal  to  the  reciprocal  of  the  number  of 
ensemble  averages  used.   It  is  interesting  to  note  that 
J(i),  the  square  root  of  the  fitting  error,  dropped  by 
approximately  the  same  factor. 

Equations  {7.10},  {7.11}.  {7.12},  {7.13}  and  {7.14}  are 
the  resulting  model  equations  obtained  from  the  last 
iteration  of  the  growth  tests  with  1,  4,  10,  40  and  100 
ensemble  averages  respectively.   The  actual  system  equation 
is  repeated  below  for  comparison. 
y(n)  =  1.0  u(n)  +  .3  u(n-1)  -  .4  y(n-1) 

+.15u(n-1)y(n-2)+v(n)  {7.9} 


One  Average : 

y(n)  =  .10149E+1  u(n)  +.45434E+0  u(n-1)  -.18148E+0  u(n-2) 
20995E+0  u(n-1)y(n-2)  +.14323E+0  u(n)u(n) 


19578E- 
81504E- 
59792E- 
41998E- 
18208E- 


y(n-2)  -.24651E-1  u(n-2)u(n-2) 
u(n-2)y(n-2)  -.96112E-1  u(n)u(n-1) 
u(n-1)u(n-1)  -.78196E-1  y(n-1)y(n-2) 
y(n-2)y(n-2)  +.67933E-1  u(n)y(n-1) 
u(n)y(n-2)  -.36490E-1  u(n-1)y(n-1)   {7.10} 
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Note  that  the  model  growth  did  not  select  the  system  term 
y(n-1).  The  additive  output  noise  is  degrading  the  model 
growth  capabilities. 

Four  Averages ; 

y(n)  =  .10179E+1  u(n)  +.59065E+0  u(n-1) 

-.9497OE-2  u(n-1)u(n-2)  -.14650E+0  u(n-2) 

+.46875E-r  u(n)u(n)  -.29741E-1  u(n-2)y(n-2) 

-.18258E+0  y(n-1)  +.59640E-1  y(n-2) 

+.16499E+0  u(n-1)y(n-2)  (7.11} 

Ten  Averages: 

y(n)  =  .10058E+1  u(n)  +.74664E+0  u(n-1) 

+  .15I8OE  +  O  u(n-1)y(n-2)  -.3I8O8E-I  u(n-.2) 

-.34365E+O  y(n-1)  +.90579E-2  y(n-2) 

+.16271E-1  u(n)u(n)  -.47212E-2  y(n-2)y(n-2)    {7.12} 

Forty  Averages; 

y(n)  r  .10003E+1  u(n)  +.80620E+0  u(n-1) 

+  .14979E+0  u(n-1)y(n-2)  +  .76613E-2  u(n-2) 

-.40594E+0  y(n-1)  -.60805E-2  y(n-2) 

+.13304E-2  y(n-2)y(n-2)  {7.13) 

One  Hundred  Averages: 

y(n)  =  .10001E+1  u(n)  +.80304E+0  u(n-1) 

+.14990E+0  u(n-1)y(n-2)  +  .35466E-2  u(n-2) 

-.40295E+0  y(n-1)  -.26935E-2  y(n-2) 

+.56228E-3  y(n-2)y(n-2)  (7.14} 
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The  results  presented  in  Table  17,  and  in  the  preceding 
equations,  clearly  demonstrate  the  significant  improvements 
in  model  growth  and  model  accuracy  that  can  be  obtained  when 
we  can  reduce  the  effect  of  additive  output  noise  by 
averaging.   This  averaging  and  growth  technique  is  useful 
whenever  the  statistics  of  the  additive  output  noise  do  not 
change  during  the  experiment. 

Experiment  7 

The  purpose  of  this  experiment  is  to  demonstrate  the 
model  growth  improvement  resulting  from  use  of  the  two-stage 
"N-R"  technique  discussed  at  the  end  of  Chapter  VI.   This 
technique  is  applicable  to  model  growth  when  we  have  only 
one  set  of  system  input  and  output  measurement  sequences, 
and  the  output  sequence  contains  additive  noise.   We  also 
develop  alternate  criteria  for  evaluating  the  fit  of  a 
model . 

For  this  experiment,  we  use  the  same  nonlinear  system  as 
in  Experiment  6 . 
y(n)  =  1.0  u(n)  +  .8  u(n-1)  -  .4  y(n-1) 

+  .15  u(n-1)y(n-2)  +  v(n)  {7.15} 

The  input  sequence  {  u ( n  )  ;  1<  =  n< = 1 000 }  is  uniformly 
distributed  between  the  amplitude  limits  (-2,2),  and  the 
additive  noise  sequence  {v(n)}  is  uniformly  distributed 
between  the  amplitude  limits  (-.2,. 2).   These  two  sequences 
are  uncorrela t ed ,  both  with  themselves  and  each  other. 
After  generating  the  noisy  system  output  sequence  {y(n)} 
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corresponding  to  Eq.  {7.15},  we  grow  a  recursive  model  using 
the  Search  Indicator  Growth  Algorithm  and  the  Candidate 
Model  Specification  Technique.   The  additive  output  noise 
degrades  the  growth  but  this  step  is  included  to  show  the 
typical  results  obtained  with  noisy  data  sequences.   This 
first  modeling  example  is  denoted  as  Test  1  and  the  results 
are  summarized  in  Table  18. 

Using  the  VOL  model  form  of  Eq .  {2.4}  and  the  Search 
Indicator  Growth  Algorithm,  we  next  grow  a  nonrecur si ve , 
nonlinear  model  from  the  available  measurement  sequences 
(first  phase  of  the  N-R  technique).   We  selected  the  system 
given  by  Eq .  {7.15}  to  be  of  recursive  nonlinear  BVM  form, 
and  therefore  any  finite  VOL  model  produced  by  our  growth 
algorithm  can  only  approximate  the  performance  of  the 
system.   Since  we  used  the  Search  Indicator  Growth 
Algorithm,  the  resulting  VOL  model  is  more  compact  than  any 
block  form  model  using  the  VOL  form.   We  give  extra  freedom 
(larger  d  and  ra)  to  the  candidate  VOL  model  terms  to  allow 
for  improved  growth  performance.   The  results  of  this  second 
modeling  example  are  given  as  Test  2  in  Table  18. 

After  evaluating  the  coefficients  of  our  final  VOL  model 
from  the  preceding  growth  step,  we  synthesize  it  on  the 
computer  and  probe  it  with  our  stored  system  input  sequence 
{u(n)}.   The  resulting  model  output  sequence  {y(n)}  is 
stored,  and  used  with  {u(n)}  to  grow  a  recursive  model  of 
the  BVM  form  using  the  Search  Indicator  Growth  Algorithm 


196 


(second  phase  of  the  N-R  technique).   The  results  of  this 
third  modeling  example  are  summarized  as  Test  3  in  Table  18. 

We  include  two  direct  least  squares  modeling  examples 
using  the  exact  form  of  the  system.   First  we  use  |u(n)l  and 
|y(n)}  to  obtain  an  evaluation  of  the  correct  model  using 
actual  system  data.   This  is  summarized  as  Test  4  in  Table 
18.   Finally  we  use  |u(n)l  and  ly(n)|  to  obtain  an 
evaluation  of  the  correct  model  using  the  output  data  from 
the  nonrecursive  VOL  model  realization  obtained  in  Test  2. 
This  is  summarized  as  Test  5  in  Table  18. 


TEST  DESCRIPTION 
^      ■ 

TOTAL 
TERMS 

\ 

CONDITION 
NUMBER 

SQUARE  R001 
OF  FITTING 
ERROR;  J 

'  MINIMUM 
rJvALUE  OF 

/residual 
'  sequence 

|e(n)l 

MAXIMUM 
VALUE  OF 
RESIDUAL 
SEQUENCE 
le(n)l 

1  :   SIGA  GROWTH  OF  A 
RECURSIVE  BVM  USING 
NOISY  SYSTEM  DATA. 

9 

51  6.7 

.  1  250 

-.2893 

.3027 

2:   SIGA  GROWTH  OF  NON- 
1    RECURSIVE  VOL  MODEL 
USING  NOISY  SYSTEM 
DATA. 

19 

2.3 

.1193 

-.3048 

.3401 

3:   SIGA  GROWTH  OF  A 
RECURSIVE  BVM  USING 
OUTPUT  DATA  FROM  VOL 
MODEL  OF  TEST  2. 

7 

423.6 

.  1  147 

-.2941 

.2572 

4:   DIRECT  EVALUATION 
OF  EXACT  MODEL  USING 
NOISY  SYSTEM  DATA. 

4 

19.3 

1 

.  1273 

-.2971 

•  3169 

5:   DIRECT  EVALUATION 
OF  EXACT  MODEL  USING 
OUTPUT  DATA  FROM  VOL 
MODEL  OF  TEST  2. 

4 

20.06 

.1152 

-.2961 

.  2510 

'ABLE  1 8 ;   Summary  Results  from  Experiment  7 
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Equations  !7.16|,  {7.17},  J7.18},  {7.19}  and  l7.20{  are  the  resulting 
■odel  equations  obtained  from  the  preceding  Test  1,  Test  2,    Test  3f  Test  4  and 
Test  5.  respectively.   The  actual  system  equation  is  repeated  below  for 
coaparison. 
y(n)  -  1.0  u(n)  +.8  u(n-l)  -.4  y(n-l)  +.15  u(n-l)y(n-2)  +  v(n)     17.15} 

y(n)  -  .10059E  +  1  u(n)  +.69807E+0  u(n-l)  +.107983-3  u(n-Ou(n-2) 

♦.13629E+0  u(n-l)y(n-2)  -.79330E-3  u(n-2)  -.29862E+0  y(n-l) 

♦.35210E-3  y(n-2)  -.13068E-3  y(n-2)y(n-2) 

♦.14619E-3    u(n-2)u(n-2)  17.16} 

y(n)    -    .10064S+1     u(n)     ♦.39759E-0    u(n-l)    -.16452E+0    u(n-2) 
+.14916E+0    u(n-l)u(n-2)    +.64402E-3    u(n-2) 
-.66326E-3    u(n-2)u(n-3)    -.26126E-3    u(n-l)u(n-4) 
-.26433E-3    u(n-2)u(n-4)    ♦.28719E-3    u(n-3)u(n-4) 
♦.54585E-3    u(n-l)u(n-3)    ♦.85071E-4    u(n-3)u(n-5) 
-.15249E-3    u(n-4)    ♦    .12135E-3    u(n-5) 
-.10625E-3    u(n-4)u(n-6)    +.59477B-4    u(n)u(n-8) 
♦.63433B-4    u(n-l)u(n-5)    ♦.56957E-4    u(n-l)u(n-7) 
+.66455B-4    u(n-2)u(n-5)    -.61506E-4    u(n-4)u(n-8)  17.17} 

y(n)    -    .10050E+1     u(n)    ♦.84321E+0    u(n-l)    ♦.14686E+0    u(n-l)y(n-2) 
♦.33216B-1     u(n-2)    -.44315E+0    y(n-l)    -.20647E-1    y(n-2) 
♦.25836E-3    y(n-l)y(n-2)  17.18} 

y(n)    -    .10059E+1     u(n)     ♦.79303E+0    u(n-l)    -.39294E+0    y(n-l) 

♦.14394E+0    u(n-l)y(n-l)  17.19} 

y(n)    -    ,10047E+1    u(n)     ♦.81040E+0    u(n-l)    -.41073E+0    y(n-l) 

+  .  14214E+0    u(n-1 )y(n-2)  l7.20) 
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This  experiment  was  designed  to  show  the  typical 
modeling  improvement  resulting  from  the  two-stage  "N-R" 
growth  algorithm.   The  nonrecursive  VOL  model  growth  phase 
(Test  2)  was  continued  until  a  fitting  error  value  less  than 
the  model  of  Test  1  was  obtained.   Table  18  shows  that  a 
significantly  larger  but  manageable  number  of  model  terms 
are  required  in  this  phase.   The  final  VOL  model  has  a  very 
low  condition  number.   This  is  due  to  both  the  nonrecursive 
nature  of  the  VOL  model  form,  and  the  property  of  the  Search 
Indicator  Growth  Algorithm  which  only  picks  model  terms 
offering  substantial  reduction  in  the  fitting  error. 

Table  13  shows  that  the  model  of  Test  3  has  fewer  terms, 
lower  condition  number,  and  lower  fitting  error  than  the 
model  of  Test  1,  but  not  by  much.   The  error  of  Test  4  is 
higher  than  Tests  1  through  3,  but  this  is  primarily  due  to 
the  lower  number  of  final  model  terms.   The  error  of  Test  5 
is  the  lowest  for  this  number  of  model  terms. 

We  know  that  the  models  of  Test  4  and  Test  5  should  be 
better  than  the  models  of  Test  1  and  Test  3,  respectively, 
but  it  is  difficult  to  recognize  this  from  the  values  in  the 
table.   The  additive  output  noise  causes  an  offset  in  the 
fitting  error  and  we  cannot  use  just  this  scalar  performance 
criterion  J(i)  to  rate  the  quality  of  fit.   Ye  instead  must 
find  some  additional  characteristics  of  our  obtained  model 
fit  to  demonstrate  that  we  have  a  meaningful  and  useful 
growth  technique  when  there  is  additive  output  noise. 
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One  possible  measure  of  the  quality  of  fit  is  the 
amplitude  range  of  the  error  residual  sequence  {e(n)}  at  the 
end  of  each  test.   A  wide  spread  between  the  maximum  and 
minimum  residual  values  would  generally  indicate  a  poor 
model  fit.   Conversely,  a  small  spread  (compared  to  the 
spread  of  the  system  output  sequence  |y(n)])  would  generally 
indicate  that  we  have  a  good  fit.   Table  18  includes  the 
maximum  and  minimum  values  of  |e(n)|  for  the  last  model 
obtained  from  each  of  the  five  tests.   There  is  some 
difference  in  the  spread  in  each  of  the  five  tests,  but 
nothing  significant  enough  to  use  as  a  criterion.   The 
magnitude  of  the  additive  noise  masks  these  performance 
properties  and  we  must  look  for  another  characteristic. 

Chapter  VI  mentioned  that  the  residual  error  sequence 
would  be  a  random  sequence  -without  any  identifiable  trends 
or  patterns  if  we  have  adequately  modeled  the  system.   This 
condition  can  be  qualitatively  evaluated  using  a  standard 
si;ati3tical  technique  in  the  literature  [Ref.  17j.   The 
normalized  sample  autocorrelation  plot  lr(k);  k  =  0 , 1  , 2 ,  .  .  .  | 
of  a  random  sequence  should  approach  the  following  form. 


-  ic 


?  I  CUR  5  24.:   Typical  autocorrelation  plot  of  a  random  sequence 
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Here  A  equals  the  square  root  of  the  reciprocal  of  N,  the 
number  of  data  points  used  in  the  error  minimization 
(N  =n  -n  +1).   For  example,  N  =  1000  produces  the  value 
A  =  0.032. 

Following  the  conventions  of  Chapters  II  and  IV,  the 
following  equation  is  used  for  the  normalized  sample 
autocorrelation  of  a  signal  {s(n)}  at  lag  k. 

n 


r(k) 


2    Y,         s(n)s(n-k) 
N    n  =  n  o 


r(0) 


(7.21} 


A  sequence  is  typically  considered  to  be  random  if  the 

4. 

values  of  { r ( k )  ;  k=  1  ,  2  ,  3  .  •  •  • }  lie  between  «.2A  for  at  least 

95J  of  the  normalized  autocorrelation  plot  [Ref.  431. 

The  first  seven  normalized  autocorrelation  values  of  the 
error  residual  sequences  from  each  of  the  five  previous 
tests  have  been  calculated  from  the  experimental  data.   The 
autocorrelation  values  for  the  random  additive  noise 
sequence  {v(n)}  and  the  random  input  sequence  {u(n)}  have 
been  calculated  for  comparison  purposes,  and  these  are 
denoted  as  Test  6  and  Test  7,  respectively.   These 
autocorrelation  values  are  summarized  in  Table  19  and 
graphically  presented  in  Figure  25.   A  split  format 
presentation  is  used  in  Figure  25  to  better  distinguish  the 
different  autocorrelation  plots. 
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TEST  DESCRIPTION 

1  :   SIGA  GROWTH  OF 
A  RECURSIVE  BVM 
USING  NOISY 
SYSTEM  DATA 


NORMALIZED  AUTOCORRELATION  VALUES 
r(1)    r(2)    r(3)    r(4)    r(5)    r(6) 


2928   .0095   .0018  -.0180  -.1320   .0112 


2:   SIGA  GROWTH  OF 
A  NONRECURSIVE 
VOL  MODEL  USING 
NOISY  SYSTEM 
DATA 


.0258   .0286   .0020  -.0078  -.0572 


0451 


SIGA  GROWTH  OF 
A  RECURSIVE  BVM 
USING  OUTPUT  DATA 
FROM  VOL  MODEL 
OF  TEST  2 


0677 


0178  -.0127  -.0097  -.0641 


0391 


4:   DIRECT  EVALUATION 
OF  EXACT  MODEL 
USING  NOISY 
SYSTEM  DATA         .3756 


0302  -.0106  -.0243  -.0446   .0111 


5:   DIRECT  EVALUATION 
OF  EXACT  MODEL 
USING  OUTPUT  DATA 
FROM  VOL  MODEL 
OF  TEST  2  .0570 


0160  -.0163  -.0092  -.0523 


0385 


6:   ADDITIVE  NOISE 
SEQUENCE 


0618 


0174  -.0176   .0055  -.0685   .0514 


7:   INPUT  SEQUENCE    .0162   .0542   .0256  -.0283  -.0352   .0504 

TABLE  19:  Autocorrelation  values  of  various  error  residual 
and  other  random  sequences  in  Experiment  7 
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RESIDUAL  FROM 
TEST  4 


INPUT  SEQUENCE 


// 

■/  / 


A3   AA\  5     //  6 


NOISE  SEQUENCE 
CTEST  6]    - 


;. RESIDUAL  FROM 
TEST  5 


-     -.15 
25:       Normalized    autocorrelation    plots    for    various 
sequences    in    IDxperiment    7 
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Note  -che  differences  between  the  results  of  Test  1  and 
Tesr  3  in  Table  19  and  Figure  25.   The  residual  signal  from 
Test  1  has  value  r(l)  =  .2928,  indicating  that  this  signal 
is  nonrandom.   The  residual  signal  from  Test  3    has  value 
r(l)  =  .0258,  and  all  other  autocorrelation  values  are 
between  -2A  and  +2A,  indicating  that  this  signal  is 
reasonably  random.   This  leads  to  the  conjecture  that  the 
model  of  Test  J    is  "better"  than  the  model  of  Test  1, 
because  the  residual  sequence  from  Test  3  is  more  random. 

This  conjecture  is  further  supported  by  comparing  the 
autocorrelation  data  from  Test  4  and  Test  5;  the  exact  model 
fit  cases.   The  residual  signal  from  Test  4  has  value 
r(l  )  ■  .3756  which  indicates  that  the  signal  is  nonrandom. 
The  residual  signal  from  Test  5  is  significantly  more 
random.   The  sample  autocorrelation  data  and  plots 
corresponding  to  the  additive  noise  sequence  (Test  6),  and 
the  input  sequence  (Test  7),  provide  examples  of  how  the 
autocorrelation  values  should  appear  for  typical  random 
sequences.   While  some  differences  can  be  recognized  in  the 
plots  of  Figure  25,  we  would  like  to  have  another  criterion 
that  could  more  clearly  indicate  which  sequence  is  more 
random . 

We  developed  a  measure  for  the  randomness  of  a  sequence 
based  on  the  cumulative  distribution  of  runs  of  varying 
lengths.   A  run  of  length  k  is  defined  as  a  contiguous 
sequence  of  k  data  points  with  the  same  sign,  bordered  by 
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data  points  with  the  opposite  sign  [Ref.  43].   As  an 

example,  consider  the  following  sequence. 

{s(n);1<=n<=1l|  =  |-.2,-.5,+.3,-.1,+.1,+.3,+.05,-.1,-.2,-.05,-.04l 

This  has  2  runs  of  length  1,  1  run  of  length  2,  1  run  of 

length  3,  and  1  run  of  length  4. 

We  define  a  factor  for  the  "randomness"  of  a  sequence 
(sCn)],  to  be  the  percentage  of  runs  with  length  less  than 
or  equal  a  small  integer  k. 


P(k) 


V   Number  of  runs  of  length  j  in  the  sequence 


Total  number  of  runs  in  the  sequence 


17.22} 


A  random  sequence  should  primarily  have  runs  of  low 
size.   Therefore  0( k)  should  increase  rapidly  for  small 
values  0  f  *ic .   Table  20  contains  the  values  of  Oik)    versus  k 
for  the  five  error  residual  sequences  from  Test  1  through 
Tesr  5.   We  also  include  the  values  of yO(k)  versus  k  for 
both  the  random  additive  noise  sequence  |v(n)j  and  the  input 
sequence  lu(n)|,  as  a  comparison  basis.   These  are  denoted 
as  Test  6  and  Test  7,  respectively.   This  calculated  data  is 
graphically  presented  in  Figure  26.   A  split  format 
presentation  is  used  in  Figure  26  to  better  distinguish  the 
different  distribution  of  runs  plots.   The  abbreviation  SIGA 
is  used  for  the  Search  Indicator  Growth  Algorithm  where 
space  is  limited. 
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TEST  DESCRIPTION 


CUMULATIVE  DISTRIBUTION  OF  RUNS 
P(1)     p(2)     p(3)     p(4)     p(5) 


1 :   SIGA  GROWTH  OF  A 
RECURSIVE  BVM  USING 
NOISY  SYSTEM  DATA.    .412 


.51  9 


.779 


.860 


.914 


2:   SIGA  GROWTH  OF  NON- 
RECURSIVE  VOL  MODEL 
USING  NOISY  SYSTEM 
DATA.  .475 


.733 


857 


925 


951 


SIGA  GROWTH  OF  A 
RECURSIVE  BVM  USING 
OUTPUT  DATA  FROM  VOL 
MODEL  OF  TEST  2.       .474 


.71  1 


838 


904 


954 


4:   DIRECT  EVALUATION 
OP  EXACT  MODEL  USING 
NOISY  SYSTEM  DATA.    .363 


605     .768     .856     .912 


5:   DIRECT  EVALUATION 
OF  EXACT  MODEL  USING 
OUTPUT  DATA  FROM  VOL 
MODEL  OF  TEST  2.  •     .463 


715     .846     .913     .959 


6:   ADDITIVE  NOISE 
SEQUENCE 


.483 


733     .866     .928     .969 


INPUT  SEQUENCE 


.  520 


745     .869     .930 


965 


TABLE  20:  Cumulative  distribution  of  runs  of  varying  length 
for  the  error  residuals  and  other  sequences  in 
Experiment  7 
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Test  1 

Teat  2 

Test  3 

Test  4 

Test  5 

Test  6 

Test  7 


SIGA  growth  with  recursive  3VM  and  noipy  system  data 
SIGA  growth  with  nonrecursive  VOL  and  noisy  system  data 
SIGA  growth  with  recursive  BVM  and  output  data  from 
VOL  model  of  Test  2 

Direct  evaluation  of  exact  model  using  noisy  system  data 
Direct  evaluation  of  exact  model  using  output  data 
from  VOL  model  of  Test  2 
Additive  noise  sequence 
Input  Sequence 


FIGURE  26;  Plots  of  cumulative  distribution  of  runs  of 
varying  length  for  the  error  residuals  and 
other  sequences  in  Experiment  7 


207 


Figure  26  demonstrates  the  power  of  the  randomness 
factor  pCk).   The  curves  corresponding  to  Test  6  and  Test  7 
represent  the  randomness  of  the  most  random  sequences  in 
this  experiment,  the  additive  output  noise  and  the  input 
probe.   By  comparing  the  cumulative  distribution  of  runs 
curves  for  different  residual  sequences,  we  conjecture  that 
the  curve  closest  to  that  of  Test  6  is  the  most  random 
sequence.   Except  for  pathological  cases  (e.g.  no  additive 
noise),  it  follows  that  no  error  residual  sequence  can  be 
more  random  than  our  uncorrelated  additive  output  noise. 
The  plots  of  figures  like  Figure  26  provide  an  alternate 
means  of  evaluating  the  randomness  of  sequences. 

Analysis  of  Figure  26  provides  a  clear  picture  of  the 
results  of  Experiment  7.   We  see  that  the  model  of  Test  3  is 
superior  to  the  model  of  Test  1.   Additionally,  if  we  could 
improve  our  model  growth  technique  and  somehow  obtain  the 
exact  form  of  the  model,  the  N-R  technique  would  provide  us 
with  the  model  of  Test  5.   This  is  significantly  superior  to 
the  model  of  Test  4,  the  best  we  could  hope  to  obtain  using 
the  existing  techniques  in  the  literature.   We  conclude  that 
the  N-R  technique  and  the  cumulative  distribution  plots  can 
improve  our  systems  characterization  methods  when  we  are 
faced  with  the  Case  1A  situation. 

C.   REAL  WORLD  EXPERIMENTS 

This  section  presents  the  results  of  an  experiment  using 
real  world  data  sequences.   Unlike  the  controlled 
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experiments  of  the  previous  section,  the  results  are  not  as 
dramatic.   The  actual  form  of  the  system  equation  is 
unknown,  as  are  the  specific  properties  of  the  input 
sequence  and  any  measurement  noise. 

Experiment  8 

The  New  London,  Connecticut,  Laboratory  of  the  Naval 
Underwater  Systems  Center  has  been  engaged  in  a  continuing 
series  of  research  efforts  aimed  at  accurately  modeling  a 
particular  path  in  the  ocean.   One  set  of  experiments 
involved  injecting  a  signal  into  a  transmitting  hydrophone, 
and  measuring  the  resulting  signal  at  a  distant  receiving 
hydrophone.   Three  sets  of  these  input  and  output  signals 

were  sampled,  converted  to  digital  format,  loaded  into 

12 
computer  files,  and  made  available  for  experimentation  .   We 

denote  the  different  input  sequences  of  length  1024  as 

CH1IN,  CH2IN,  and  CH3IN.   The  corresponding  output  sequences 

are  denoted  as  CH10UT,  CH20UT,  and  CH30UT. 

The  sequences  were  measured  over  a  suitably  short  time 

interval,  and  we  therefore  consider  the  acoustic  path  to  be 

time  invariant  during  the  period  of  the  measurements.   It  is 

expected  that  ambient  noise  and  signals  from  other  sources 

are  received  at  the  receiving  hydrophone.   We  are 


12   These  computer  files  were  made  available  by  Mr.  Steve 
Capizzano  of  NUSC  on  12  October  1981.   No  details  were 
available  regarding  any  potential  model  form  or  the 
characteristics  of  the  input  or  noise  sequences. 
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therefore  faced  with  a  difficult  real  world  example  of  the 
Case  1A  conditions  described  in  Chapter  VI. 

We  first  calculate  the  sample  autocorrelation  values  for 
the  three  input  sequences.   These  values  are  summarized  in 
Table  21,  and  graphically  presented  in  Figure  27. 


NORMALIZED  AUTOCORRELATION  VALUES 
INPUT  SEQUENCE   r(1)    r(2)    r(3)    r(4)    r(5)    r(6)    r(7) 


1:   CHUN 


0313  -.3069   .2131  -.1944  -.2422   .1959   .1260 


2:   CH2IN 


-.1022  -.4193   .3630  -.1216  -.3309   .1695   .1222 


3:   CH3IN 


-.0040  -.3026   .1913  -.2758  -.3072   .1630   .1366 


TABLE  2 1 :   Normalized  autocorrelation  values  for  various 
input  sequences  in  Experiment  8 

We  also  calculate  the  cumulative  distribution  of  runs 
values  for  these  input  sequences.   These  values  are  listed 
in  Table  22,  and  graphically  presented  in  Figure  28. 


CUMULATIVE  DISTRIBUTION  OF  RUNS 


INPUT  SEQUENCE   pCD    p(2)    p(3)    p(4)     pC  5 )     p(6) 


1:   CHUN 


2:   CH2IN 


3:   CH3IN 


.377     .781     .876 


942     .986 


992 


.375     .844     .912     .961     .980     .994 


.395 


781     .877 


943 


984     .996 


TABLE  22:   Cumulative  distribution  of  runs  values  for 
various  input  sequences  in  Experiment  8 
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FIGURE  27:   Normalized  Autocorrelation  plots  for  various  input 
signals  in  Experiment  8 
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FIGURE  28:   Plot  of  cumulative  distribution  of  runs  for  various 
input  sequences  in  Experiment  8 
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The  form  of  the  curves  in  Figure  27  and  Figure  28 
indicate  that  some  problems  should  be  expected.   The  input 
sequences  are  not  as  random  as  those  of  Experiments  1 
through  7.   Chapter  VI  mentioned  that  this  condition  may 
give  our  growth  techniques  some  difficulty.   We  digress 
momentarily  to  expand  on  this  point. 

In  our  computer  simulated  experimental  research  of  model 
growth,  an  input  sequence  {u(n)}  based  on  an  independent 
random  generator  distributed  over  the  amplitude  range  of 
interest  was  used.   In  many  real  world  problems,  we  are 
given  input  and  output  sequences  that  are  simply  time-series 
values  of  the  available  continuous  time  signals.   Often  the 
available  input  sequence  may  not  be  sufficiently  random,  and 
significant  autocorrelations  may  result  in  the  error 
residual,  even  with  an  adequate  model  form.   Additive  output 
noise  contributing  to  the  error  residual  may  also  result  in 
significant  autocorrelations. 

Least  squares  model  evaluation  does  not  require  any 
assumptions  such  as  independence  of  the  error  residual 
values,  but  our  candidate  term  selection  and  evaluation 
techniques  can  give  degraded  or  misleading  results  in  this 
case.   The  "goodness-of-fit"  tests  used  in  Experiment  7  may 
not  produce  suitable  results  in  these  cases. 

This  experiment  is  continued  in  an  attempt  to  gain 
further  insight  into  this  common  situation,  but  the 
expectations  are  limited  for  a  successful  characterization. 
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Based  on  the  preceding  figures,  the  first  500  data 
points  of  CH3IN  and  CH30UT  are  selected  as  the  input  and 
output  sequences  for  the  characterization  experiment.   We 
first  grow  a  recursive  linear  ARMA  model  by  the  M-Directed 
block  form  growth  technique  (with  d=1).   This  is  included 
for  comparison  since  it  is  equivalent  to  the  Box  and  Jenkins 
technique  commonly  used  in  the  literature.   This  first 
modeling  example  is  denoted  as  Test  1  and  the  results  are 
presented  in  Table  23. 

Chapter  VI  mentioned  that  it  was  possible  to  evaluate 
models  by  regression  analysis.   This  involves  considering  a 
large  set  of  candidate  model  terms,  and  evaluating  the  exact 
reduction  in  the  fitting  error  that  results  if  the  best 
performing  term  is  brought  into  the  model,  one  at  a  time. 
This  i ak  similar  to  picking  the  one  candidate  model  term  with 
the  largest  value  of  search  indicator  I(j,8)  given  by  Eq  . 
(6.17).   Recall  from  Tables  5  and  6  that  the  cost  of 
computing  I(j,8)  is  much  higher  than  the  cost  for  I(j,12). 
The  results  of  a  regression  analysis  of  the  experimental 
data  using  the  candidate  model  term  set  defined  by  a 
BVM(1,9)  are  included  for  comparison.   This  is  denoted  as 
Test  2,  and  results  are  presented  in  Table  24. 

We  next  use  the  system  data  sequences  to  grow  a 
recursive  linear  BVM  using  the  Search  Indicator  Growth 
Algorithm.   This  is  denoted  as  Test  3,  and  results  are 
presented  in  Table  25.   The  model  from  Test  3  performs 
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better  at  each  growth  iteration  than  the  model  from  Test  1. 
The  results  of  Test  3  are  almost  identical  to  the  regression 
analysis  results  of  Test  2  at  each  growth  iteration.   It  can 
be  shown  from  the  developments  in  Chapter  VI,  that  the 
Search  Indicator  Growth  Algorithm  requires  significantly 
fewer  computations  than  regression  analysis. 

For  Test  4,  we  use  the  system  data  sequences  and  grow  a 
more  general  recursive  nonlinear  BVM  using  the  Search 
Indicator  Growth  Algorithm.   This  enables  us  to  see  if  a 
nonlinear  model  would  provide  a  better  fit  than  the 
previously  analyzed  linear  ARMA  form.   The  results  are 
presented  in  Table  26. 

Using  the  VOL  model  form  of  Eq.  {2.4}  and  the  Search 
Indicator  Growth  Algorithm,  we  next  grow  a  nonrecursive 
nonlinear  model  (first  phase  of  the  N-R  technique).   We  give 
the  technique  freedom  to  consider  all  terms  in  the  V0L(2,9) 
model.   This  is  denoted  as  Test  5,  and  results  are  given  in 
Table  27. 

After  evaluating  the  coefficients  of  the  final  VOL  model 
from  Test  5,  this  model  is  synthesized  on  the  computer,  and 
probed  with  the  stored  version  of  the  input  sequence  {u(n)}. 
The  resulting  model  output  sequence  {y(n)}  is  stored,  and 
used  with  {u(n)}  to  grow  a  recursive  model  of  the  BVM  form 
using  the  Search  Indicator  Growth  Algorithm  (second  phase  of 
the  N-R  technique).   This  model  growth  is  denoted  as  Test  6, 
and  the  results  are  presented  in  Table  28. 
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The  preceding  tables  and  figures  support  the  previously 
stated  concerns  regarding  the  randomness  of  the  input 
sequence  {u(n)}.   The  results  of  Test  1  through  Test  6  are 
all  somewhat  disappointing.   Experience  with  many  controlled 
experiments  leads  to  the  conclusion  that  the  correlations 
within  the  input  sequence  are  the  main  reason  for  these 
results  in  Experiment  8.   It  should  be  mentioned,  however, 
that  another  possible  contributing  problem  is  that  the  form 
of  the  BVM  might  not  be  appropriate  for  the  physical  system 
we  are  attempting  to  represent. 

The  results  obtained  in  this  experiment  are  the  reason 
we  stated  that  systems  characterization  is  a  trial  and  error 
process  (Chapter  V).   The  choice  of  model  form  and  the 
characteristics  of  the  available  (or  hopefully  controllable) 
input  probe  are  extremely  important.   These  ultimately  must 
be  selected  by  the  user  based  on  all  available  quantitative 
and  non-quantitative  factors. 

Despite  the  high  fitting  error  obtained  in  the  various 
tests  of  Experiment  3,  several  results  are  imbedded  within 

Tables  23  through  28.   The  least  squares  techniques  are 

2 
designed  to  minimize  the  fitting  error  J  (i)  while  growing 

the  model.   The  performances  of  Test  1  through  Test  4  are 

compared  further.   Figure  29  is  a  plot  of  J(i),  the  square 

root  of  the  fitting  error,  versus  the  total  number  of  model 

terras  after  each  iteration. 
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TEST  1:  M  DinECTED 

TEST  2:  REGRESSION  ANALYSIS 

TEST  3:  SIGA  USING  LINEAR  MODEL  FORM 

TEST  4:  SIGA   USING  NONLINEAR   FORM 


FIGURE  29:  Plot  of  the  square  root  of  the  fitting  error  versus 
the  total  number  of  model  terms  after  each  growth 
iteration,  for  various  tests  in  Experiment  8 
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The  results  of  Figure  29  are  very  interesting.   The  M 
Directed  technique  (Test  1)  reduced  the  fitting  error  as  the 
order  of  the  ARMA  model  was  increased,  but  the  results  are 
significantly  poorer  than  the  regression  analysis  technique 
(Test  2).   The  Search  Indicator  Growth  Algorithm  using  the 
ARMA  model  form  (Test  3)  nearly  duplicated  the  performance 
of  the  regression  analysis.   We  previously  showed  that  the 
Search  Indicator  Growth  Algorithm  offered  substantial 
computational  savings  over  regression  analysis  (Chapter  VI). 
Figure  29  verifies  that  even  for  real  world  measurement 
sequences,  the  Search  Indicator  Growth  Algorithm  can  perform 
systems  characterization  with  results  that  are  equivalent  to 
the  best  existing  technique. 

The  Search  Indicator  Growth  Algorithm  using  the  BVM(2,9) 
model  form  (Test  U)  provided  equivalent  or  better 
performance  than  the  previous  three  growth  techniques.   This 
comparison  includes  models  with  the  same  number  of  total 
terms.   Allowing  the  algorithm  to  consider  nonlinear  terms 
resulted  in  some  of  them  being  chosen  over  the  candidate 
linear  terms  . 

Even  though  we  have  been  primarily  interested  in 

2 

minimizing  the  fitting  error  J  (i),  it  is  interesting  to 

look  at  the  maximum  and  minimum  values  in  the  error  residual 
sequences  resulting  from  each  type  of  model  growth.   Figure 
30  graphically  presents  this  information  obtained  from  Table 
23  through  Table  26. 
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TEST   1:  M  DIRECTED 

TEST  2:  REGRESSION  ANALYSIS 

TEST  3:  SIGA  USING  LINEAR  MODEL  FORM 

TEST  4:  SIGA  USING  NONLINEAR  MODEL  FORM 


FIGURE  JO:     Plot  of  the  maximum  and  minimum  values  of  the  error 
residual  sequence  versus  the  total  number  of  model 
terms  after  each  iteration,  for  various  tests  in 
Experiment  8 
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Figure  30  shows  that  there  is  a  generally  decreasing 
trend  in  the  amplitude  of  both  the  maximum  and  minimum 
residual  sequences  as  more  model  terms  are  added  by  the 
growth  techniques.   Note  that  the  decreasing  trend  is  more 
pronounced  in  the  regression  analysis  case  (Test  2)  than  the 
Box  and  Jenkins  case  (Test  1).   The  Search  Indicator  Growth 
Algorithm  with  ARMA  model  form  (Test  3)  performed  reasonably 
close  to  Test  2,  and  the  best  performance  was  obtained  when 
the  Search  Indicator  Growth  Algorithm  was  used  with  the 
BVM(2,9)  form  (Test  4).   The  model  equation  resulting  from 
Iteration  8  of  Test  U  provided  the  best  combination  of  low 
fitting  error,  good  autocorrelation  properties  of  the 
residual,  and  good  cumulative  distribution  of  runs  of  the 
residual.   This  17  term  model  equation  is; 
y(n)  =  .9345E-1  u(n-5)  -.9122E+0  y(n-2)  -.8U70E+0  y(n-4) 
-.5365E+0  y(n-6)  +.1064E+0  y(n-U)y(n-4) 
+.2415E+0  y(n-9)y(n-9)  -.3279E+0  u(n-1) 
-.3585E+0  u(n-5)  +  .1007E+0  y(n-2)y(n-9) 
-.3653E+0  u(n-3)  -.1813E+0  y(n-8)  -.1254E+0  y(n-7) 
■(..1636E  +  0  y(n-1)y(n-4)  -.2062E  +  0  y(n-5)y(n-7) 
-.3198E+0  u(n-1)y(n-7)  -.3023E+0  u(n-5)y(n-6) 
+.2965E+0  u(n-6)y(n-1)  {7.23) 

Tables  27  and  28  indicate  that  the  N-R  technique  failed 
to  improve  the  characterization.   Several  reasons  could 
exist  for  this  result.   The  previously  discussed  problems 
with  the  characteristics  of  the  input  signal  could  be  a 
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major  factor.   The  normalized  autocorrelation  values  and 
cumulative  distribution  of  runs  values  in  these  tables 
indicate  that  the  residuals  are  probably  not  random  enough. 
If  additive  output  noise  exists  in  {y(n)},  it  may  be 
correlated  with  itself  or  with  the  input  {u(n)}.   The  only 
conclusion  we  can  reach  on  this  point  is  that  in  this 
experiment,  the  N-R  technique  did  not  offer  any  advantages 
to  the  characterization  problem. 

E.   SUMMARY  OF  EXPERIMENTAL  RESULTS 

The  results  of  these  experiments  show  that  block-form 
recursive  modeling  of  nonlinear  systems  typically  produces 
(1)  non-parsimonious  models  (e.g.  contain  unneeded  terms 
with  significantly  non-zero  coefficient  values),  (2)  higher 
fitting  error  J,  (3)  higher  condition  number  for  the  least 
squares  matrix  (and  therefore  larger  variance  in  the 
coefficient  estimates),  and  (4)  distorted  coefficient 
estimates  on  the  correct  model  terms.   The  block-form 
techniques  also  require  the  availability  of  a  larger  number 
of  data  measurements,  have  a  much  larger  computational  cost, 
and  often  fail  to  converge  on  an  adequate  model  because  of 
excessive  ill-conditioning  of  the  least  squares  matrix  or 
the  limited  amount  of  available  data. 

The  block-form  growth  techniques  make  no  provisions  for 
handling  near  colinearity  in  the  candidate  model  set.   This 
typically  results  in  abnormally  high  condition  number  for 
the  least  squares  matrix  (and  therefore  larger  variance  in 
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Che  coefficient  estimates).   Experiment  2  showed  the  typical 
results  of  this  weakness. 

The  block-form  techniques  force  a  restrictive  set  of 
terms  to  be  fully  considered  at  each  growth  iteration.   This 
typically  results  in  a  significant  number  of  unnecessary 
terras  in  the  model  equation;  increasing  the  computational 
cost  and  contributing  to  other  problems.   Experiments  2,  3, 
and  4  are  examples  of  this  situation. 

The  block-form  techniques  require  the  availability  of  a 
larger  number  of  data  measurements  than  the  Search  Indicator 
Growth  Algorithm.   Therefore,  with  limited  data,  there  are 
many  cases  where  we  will  be  unable  to  grow  an  adequate  model 
for  an  application  using  block-form  techniques.   Experiment 
5  provided  a  meaningful  example  of  this  situation. 

The  Search  Indicator  Growth  Algorithm  can  better  select 
its  starting  base  model  by  using  I(j,12)  at  iteration  1.   In 
this  way  it  can  recognize  the  existence  of  a  delay  factor  L 
in  the  system,  and  can  start  the  growth  iteration  at  the 
appropriate  term  (Experiment  3). 

Finally,  the  form  of  the  Search  Indicator  Growth 
Algorithm  allows  for  simple  extensions  that  can  be  used  to 
better  handle  real  world  conditions  like  additive  output 
noise.   The  averaging  algorithm  discussed  in  Chapter  VI 
offers  some  improvement  when  conditions  permit  its  use 
(Experiment  6).   The  two-stage  "N-R"  algorithm  proposed  in 
Chapter  VI  has  some  capabilities  for  improving  system 
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characterization  when  we  cannot  probe  the  system  (Experiment 
7).   A  real-world  example  (Experiment  8)  shows  that  the 
Search  Indicator  Growth  Algorithm  can  characterize  a  system 
with  results  that  are  equivalent  to  or  better  than  existing 
techniques.   The  inability  to  control  the  input  probe  was 
shown  to  degrade  all  of  these  techniques. 

Based  on  experience  with  many  characterization 
experiments,  an  important  factor  appears  to  be  the  selected 
amplitude  range  for  the  input  probe  signal.   If  this  range 
is  too  small,  the  resultant  signals  specified  by  the 
candidate  model  terms  typically  are  highly  colinear.   This 
increases  the  ill  conditioning  of  the  least  squares  matrix 
and  degrades  the  performance  of  the  growth  algorithm. 

If  the  input  probe  is  selected  to  be  too  large,  the 
system  output  may  be  unbounded.   It  is  not  possible  or 
meaningful  to  continue  the  characterization  experiment  in 
this  case.   Experimentation  may  be  necessary  to  obtain  a 
suitable  input  probe. 

The  Search  Indicator  based  growth  techniques  have  been 
shown  to  be  superior  to  block  form  techniques,  but  further 
work  still  remains  to  be  done.   The  whole  systems 
characterization  problem  is  not  yet  solved. 
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VIII.   APPLICATIONS.  CONCLUSIONS.  AND  AREAS  FOR  FURTHER  RESEARCH 

A.   DISCUSSION  OF  APPLICATIONS 

A  main  reason  for  experimentally  modeling  a  system  is  to 
better  understand  the  nature  of  what  is  actually  happening 
in  the  system.   Another  reason  for  modeling  is  to  use  models 
in  designing  controllers  or  estimators,  and  for  simulating 
systems  to  predict  behavior.   The  model  can  serve  to  confirm 
existing  beliefs  about  functional  relationships.   A  stronger 
concept  is  that  the  use  of  model  growing  techniques  could 
lead  to  the  prediction  of  a  physically  significant  effect  of 
which  the  application  user  might  be  unaware.   If  new 
information  regarding  the  system  can  be  uncovered,  we  may  be 
better  able  to  understand  the  inner  workings  of  the  system. 

Three  current  applications  for  accurate  experimentally 
determined  models  are;  (1)  fault  detection,  (2)  fault 
evaluation,  and  (3)  reduced-order  modeling.   We  discuss  each 
of  these  in  terms  of  the  techniques  of  this  research,  and 
describe  some  new  capabilities  that  appear  to  be  useful. 

Once  a  model  with  acceptable  performance  for  a 
particular  application  has  been  obtained  from  the  set  of 
measurement  data,  we  cannot  be  sure  that  there  are  no  other 
equivalently  performing  models  with  different  sets  of  model 
terms  and  coefficient  values.   Such  equivalent  models  may 
have  been  uncovered  during  the  model  growth  iterations.   If 
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we  somehow  obtained  equivalent  models  with  different  set  of 
terras,  we  must  have  a  means  of  picking  the  "best". 

Chapter  I  mentioned  that  the  criterion  for  best  model  is 
application  dependent.   In  terras  of  performance  modeling, 
the  following  criteria  appear  best.   For  fault  detection 
applications,  the  optiraura  criterion  is  maximum  sensitivity 
of  the  error  residual  to  changes  in  each  system  parameter. 
For  fault  evaluation  applications,  the  optimum  criterion  is 
maximum  di s t ingui shabi 1 i ty  of  the  system  characteristic  that 
has  changed.   For  reduced-order  modeling  applications,  the 
optimum  criterion  is  the  best  performance  of  a  finite  terra 
model  in  duplicating  the  behavior  of  the  system. 

One  general  criterion  that  appears  to  be  a  good 
compromise  is  based  on  Ockham's  Razor,  "...  the  simplest 
model  is  the  best  ...".   We  define  the  simplest  BVM  that 
adequately  represents  the  system  performance  as  the  one  with 
the  smallest  number  of  terms,  and  the  lowest  degree  and 
memory  (when  the  number  of  terms  are  the  same).   This 
criterion  is  probably  not  optimum  for  all  applications. 

Assuming  that  we  have  reduced  our  set  of  equivalent 
models  to  one  "best"  model,  there  still  are  two  main 
concerns.   First,  we  would  like  to  know  if  any  simpler 
equivalently  performing  model  exists.   It  is  conceivable 
that  particular  systems  might  be  better  modeled  by  a 
different  functional  form  than  BVM,  but  we  will  not  consider 
this  case.   If  a  model  is  obtained  by  one  of  the  block-form 
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techniques  of  Chapter  V,  the  experiments  of  Chapter  VII 
demonstrate  that  the  model  may  contain  a  large  number  of 
extra  terms.   If  the  models  were  obtained  by  the  Search 
Indicator  Growth  Algorithm,  unneeded  model  terms  may  have 
been  included  at  various  iterations.   We  want  the  most 
parsimonious  (minimum  number  of  terra)  model  that  matches  the 
performance  of  the  system  within  acceptable  error. 

The  second  concern  is  how  to  efficiently  use  the  model 
in  applications.   If  the  model  is  simulated  and  used  as 
shown  in  Figure  1  of  Chapter  I,  a  running  average  of  the 
squared  difference  between  the  system  and  model  output  can 
be  monitored.   When  this  average  exceeds  a  threshold,  a 
fault  may  have  occurred  in  the  system.   Other  factors  that 
might  also  cause  this  condition  include;  (1)  increased 
additive  measurement  noise,  and  (2)  the  current  input  signal 
exceeding  the  amplitude  range  used  in  this  model's  growth. 

Once  a  fault  has  been  detected,  the  coefficient  values 
can  be  re-estimated  and  used  as  an  indication  of  the 
possible  kind  of  fault.   This  last  step  is  the  basis  of  the 
fault  evaluation  application.   The  work  that  follows  is 
designed  to  improve  the  efficiency  and  accuracy  of  the 
approach  to  both  of  these  modeling  concerns. 

A  possible  concept  is  to  select  the  subset  of  model 
terms  whose  coefficients  are  most  robust  to  variations  in 
conditions  that  are  unrelated  to  system  faults.   These 
coefficients  are  designated  as  the  "syndromes"  of  the  model, 
and  the  following  method  is  proposed  for  their  selection. 
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step  1 :   Given  the  best  performance  fitting  model, 

estimate  the  coefficient  values  p(i)  that  minimize 

2 

the  error  criterion  J  (i)  for  different  random  input 

probes,  each  with  approximately  the  same  amplitude 
distribution.   Mark  those  coefficients  whose 
estimates  remain  nearly  constant  from  test  to  test 
as  having  the  property  "A". 
Step  2 :   Using  the  same  model,  estimate  the 

coefficient  values  p(i)  that  minimize  the  error 

2 

criterion  J  Ci)  for  different  ranges  of  input 

amplitude  (continue  to  use  a  uniform  amplitude 
distribution).   Limit  this  range  of  input  amplitude 
to  the  known  or  assumed  range  of  the  actual 
operating  input  of  the  system.   Mark  those 
coefficients  whose  estimates  remain  nearly  constant 
from  test  to  test  as  having  the  property  "B". 
Coefficients  with  both  the  "A"  and  "B"  property  are 
conjectured  to  be  robust  to  variations  in  both  the  input 
probe  level  and  the  particular  probe  contents.   These 
syndromes  should  therefore  be  most  sensitive  to  changes  in 
the  system  under  test.   It  is  logical  to  expect  that  the 
final  model  obtained  by  the  performance  modeling  based 
Search  Indicator  Growth  Algorithm  of  Chapter  VI,  would 
provide  a  superior  fault  detection  signal  threshold,  and 
better  fault  evaluation  syndromes,  than  the  models  obtained 
by  the  block-form  growth  algorithms  of  Chapter  V. 
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All  of  the  fault  detection  and  evaluation  methods  in  two 
well  referenced  survey  reports  [Ref.  44  and  45],  and  other 
recent  papers  [Ref.  46,  4?  and  48],  are  based  on  using  the 
full  set  of  coefficients  of  the  obtained  model  form.   The 
preceding  development  suggests  that  the  set  of  syndromes 
would  provide  a  clearer  reference  for  recognizing  system 
faults  than  would  the  full  set  of  model  coefficients.   It  is 
also  expected  that  these  Search  Indicator  Growth  Algorithm 
developed  syndromes  would  be  superior  to  the  large  set  of 
Nonlinear  ARMA  lattice  coefficients  proposed  for  this 
application  by  Reference  5. 

The  reduced-order  modeling  problem  has  received 
considerable  attention  in  the  literature  [Ref.  49  and  50]. 
The  concept  is  to  determine  the  particular  finite  size 
(number  of  terms)  model  that  best  matches  the  performance  of 
some  system.   The  existing  techniques  attempt  to  fit  this 
problem  to  a  particular  parameter  estimation  form  of 
solution  (e.g.  recursive-in-order  model  form  and  the  use  of 
Levinson's  algorithm).   The  results  of  Chapter  III  and 
Appendix  3  indicate  that  these  methods  would  generally  lead 
to  suboptimal  models  unless  the  restrictive  assumptions  are 
met.   It  makes  more  sense  to  grow  a  model  using  a  general 
performance  modeling  technique  like  the  Search  Indicator 
Growth  Algorithm,  rather  than  disguising  the  problem  in  a 
parameter  estimation  form.   It  is  also  easier  to  optimize 
the  model  performance  while  limiting  the  number  of  model 
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terras.   The  experiments  in  Chapter  VII  contain  many  examples 
showing  where  the  Search  Indicator  Growth  Algorithm  produced 
equal  size  models  with  vastly  superior  performance  compared 
with  recursive-in-order  techniques. 

B.   CONCLUSIONS 

The  purpose  of  this  research  was  to  extend  existing 
techniques  for  experimentally  developing  discrete-time  model 
equations  to  represent  the  input-output  behavior  of  linear 
and  nonlinear  systems. 

We  started  by  dividing  the  problem  into  four  key  parts; 
the  functional  form  of  the  model,  choice  of  error 
minimization  method,  efficiency  of  model  selection  and 
evaluation,  and  verification  of  the  quality  of  the  model  for 
various  current  applications.   After  a  discussion  of 
existing  discrete-time  model  forms,  we  adopted  the  more 
general  Bivariate  Volterra  Model  (BVM).   Various  error 
minimization  methods  were  examined,  and  it  was  shown  how  the 
Covariance  least  squares  method  is  generally  significantly 
superior  to  the  Autocorrelation  method  typically  used  in  the 
literature.   We  next  developed  expressions  for  the 
distortions  in  both  the  fitting  error  and  the  coefficient 
estimates  of  a  linear  recursive  model  form  when  there  exists 
additive  output  noise.   These  results  clearly  show  the 
effects  of  the  magnitude  of  the  recursive  coefficients  and 
the  sample  autocorrelation  values  of  the  noise  sequence. 

A  general  set  of  recursive  solution  and  evaluation 
equations  was  developed  for  computational  savings  and  as  a 
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unifying  basis  for  the  work  that  followed.   This  enabled  us 
to  easily  evaluate  a  wide  range  of  model  equations  without 
limiting  the  form  of  the  model  or  making  other  unnecessarily 
restricting  assumptions.   Existing  model  growth  techniques 
were  examined  and  extended  to  allow  consideration  of  the 
more  general  BVM  form.   Inherent  limitations  (e.g.  maximum 
number  of  model  terras  less  than  or  equal  to  the  number  of 
available  data  measurements)  were  recognized. 

A  major  goal  was  the  development  of  a  growth  technique 
that  could  perform  better  than  existing  techniques.   The 
concept  of  Search  Indicators  was  introduced,  and  led  to  the 
development  of  the  Search  Indicator  Growth  Algorithm  and 
related  special  techniques.   The  physical  interpretation  and 
significant  computational  savings  resulting  from  the  use  of 
this  algorithm  were  presented,  along  with  provisions  for 
handling  the  important  problem  of  colinearity  among  the 
model  terms.   Various  conditions  affecting  the  evaluation 
and  growth  of  models  were  examined,  and  several  special 
techniques  were  proposed. 

The  remainder  of  the  thesis  focused  on  experimental 
verification  of  the  strengths  and  weaknesses  of  the  various 
model  growth  techniques.   These  results  clearly  showed  the 
improved  performance  of  the  Search  Indicator  Growth 
Algorithm.   Some  specific  ideas  for  improving  the 
development  and  use  of  mathematical  models  in  several 
current  applications  were  also  presented. 
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C.   AREAS  FOR  FURTHER  RESEARCH 

Various  interesting  questions  were  encountered  during 
the  development  of  this  thesis.   Those  listed  below  can  form 
the  basis  for  extension  of  this  work  and  are  recommended  as 
areas  for  further  research. 

1.  The  BVM  form  was  emphasized  in  this  research  but  the 
modeling  techniques  are  not  limited  to  terms  that 
only  contain  integer  powers,  and  products  of  powers, 
of  past  and  present  input  values  and  past  output 
values.   The  model  form  could  be  extended  to  include 
decaying  exponentials,  divide  functions,  and  other 
factors  of  the  measurements.   These  could  be 
explicitly  included,  or  we  could  approximate  factors 
like  exponentials  with  difference  equations. 

2.  The  Candidate  Model  Specification  Technique  is  a 
first  heuristic  approach  for  specifying  the 
candidate  terms  to  use  in  the  Search  Indicator 
Growth  Algorithm.   Techniques  for  decoding  the 
patterns  in  the  residuals  might  lead  to  improved 
methods  . 

3.  The  development  of  the  key  search  indicator  I(j,12) 
included  the  definition  of  a  matrix  H(i-1)  given  by 
Zq.  {6.25}.   This  matrix  has  several  interesting 
properties  (e.g.  positive  semi-definite  and 
idempotent),  and  it  might  be  possible  to  exploit 
these  to  produce  improved  search  indicators. 
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The  Search  Indicator  Growth  Algorithm  performance  is 
dependent  on  the  choice  of  the  candidate  model  terras 
at  each  iteration,  and  the  values  of  the  two 
heuristic  variables  h^  and  h..   These  values  could 
either  be  kept  fixed  throughout  the  modeling 
iterations,  or  possibly  be  made  adaptive. 
The  Search  Indicator  Growth  Algorithm  has  been  shown 
to  be  useful  for  bringing  terms  into  the  model 
equation.   Since  the  candidate  set  is  allowed  to 
expand  after  each  growth  iteration,  it  follows  that 
unneeded  terms  might  exist  in  the  model.   It  might 
be  possible  to  develop  efficient  search  indicators 
that  operate  on  the  existing  set  of  model  terms  and 
suggest  which  should  be  eliminated  at  each 
iteration.   Existing,  but  expensive,  techniques 
include  backward  regression  [Ref.  40]. 
Chapter  VI  made  a  case  for  uniform  amplitude 
distribution  of  the  controllable  input  probe 
sequence,  but  mentioned  that  other  distributions 
could  provide  better  results  in  certain  cases.   For 
example,  a  distribution  that  put  emphasis  on  higher 
amplitude  values  might  be  better  suited  for 
recognizing  specific  strong  nonlinear i ties  in  the 
system . 

Additional  modeling  experience  needs  to  be  gained 
with  real  world  experiments. 
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APPENDIX  A.   GENERAL  RECURSIVE  SOLUTION 
OF  A  GROWING  SET  OF  NORMAL  EQUATIONS 


The  solution  of  a  large  set  of  normal  equations  occurs 
in  various  fields,  including  systems  identification,  linear 
prediction,  and  least  squares  estimation.   Successive  sets 
of  normal  equations  must  be  solved,  where  the  preceding  set 
can  be  related  to  the  subsequent  set  in  the  partitioned  form 
described  below. 

Set  1  :     A(1  )_£(1  )   =   h(l  )  (a.1  ] 

Set  2:     A(2)2(2)   =   h ( 2 )  iA.2l 

where 


A(2) 


I 


A(1)    1  B(2/1) 


B(2/1  )^l  A(2/1  ) 
I 


and 


h(2)^  -Qld)^   U(2/1)'^] 


lA.3l 


U.4l 


Set  i:     A(i)2(i)   =   h  ( i ) 
where 


and 


A(i) 


h(i) 


I 


A(i-1)    l3(i/i-l) 


B(i/i-l)^j A(i/i-l) 


■& 


T   '  t1 

(i-1)    lh(i/i-l)j 


Each  set  is  of  size  c(i),  where  c(i-l)  <  cCi) 


|A.5l 


|A.6] 


U.vl 
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Rather  than  solving  each  set  independently,  the  general 
partitioned  structure  of  these  related  sets  of  equations  can 
be  exploited  to  obtain  the  solution  to  each  set  in  an 
efficient  recursive  manner.   From  Eq .  [4.22|,  we  use  the 
notation  q(i)  =  c(i)  -  c(i-l). 

Given  a  set  of  c(i)  linear  normal  equation; 

A(i)2(i)  =  h(i)  Ia.sI 

where  we  have  a  unique  solution  to  the  previous  set  of 


c(i-1 )  normal  equations; 

2(i-l)   =   A(i-1)   h(i-l) 
Substituting  [a.6|  and  |a.7|  into  |a.8|  yields; 


£(i) 


lA.9l 


A(i-1)           |B(i/i-l) 

"  h(i-l) 

B(i/i-l)^   |A(i/i-l) 

h(i/i-l) 

|a.io1 


The  partitioned  matrix  inversion  theorem  [Ref.  12,  pp.  18j 
permits  exploitation  of  the  symmetry  of  matrix  A(i) 


whenever    |A(i/i-l)|     i    0. 
2(i)    = 


ACi-lf^-    F(i)G(i)"S(i)^ 


GdfSd)^ 


F_(£)G_(i) 
GCif^ 


-1 


h(_i-l)__ 
h(i/i-l) 


1a.ii  1 


where  F(i)  is  a  c(i-l)  x  q(i)  matrix,  and  G(i)  is  a 

q(i)  X  q(i)  matrix  each  defined  below. 

F(i)   -   -A(i-1  )'■"■  B(i/i-1  )  |a.12} 

G(i)   =   A(i/i-l)   -   B(i/i-1  )'^A(i-1  )'  B(i/i-l) 

=   A(i/i-1  )   ^   B(i/i-1  )'^F(i)         ^.    _  |A.13l 

Expanding    Sq .     {a.11|     produces; 


X( 


i)  = 


A(i-1   "^  h(i-1  )     +    F(i)G(ir"''F(i)^h(i-1  )     +    F(i)G(ir^  h(i/i-l7 


G(i)'S(i)^h(i-1  )     *    G(i)~'^h(i/i-1  ) 


JA.Ul 
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Define  two  vectors  of  size  q(i); 

_£(i)   =   h(i/i-1  )  +  F(i)  h(i-1  ) 

k(i)   =   G(ir^g(i) 
Substituting  JA.15|  and  |a.16|  into  Ia.141 


2(i) 


-1 


£(i-1 )  +  F(i)G(i)   g(i) 


G(ir^g(i) 


A. 15} 
A.  1  6] 


D(i-1 )  +  F(i)k(i) 


k(i) 


|a.17| 


2(_i-J  ) 

0 


+ 

"FCif 

I 

k(i) 


!a. 1 8| 


Equation  {A. 18;  is  our  desired  answer  and  is  presented 
in  Chapter  IV  as  Eq .  {4.35}. 

A  compact  recursive  expression  for  the  resulting  ininimum 

2 
average  sura  squared  error,  J  (i),  can  also  be  developed. 

Substituting  |4.4l  into  {4. 3!  for  the  (i)    model  produces; 

J^i)  '   L  l^Z   -    r(i)^R(ir^r(i) 
N 


1  l^Z   -  £^i)  i(i) 


lA.19l 


Using  the  definitions  of  Eq .  |4.9l  through  l4.12|,  the 
vectors  _r(i)  and  9_(i)  can  be  rearranged  in  the  form  of 
vectors  ii(i)  and  ^^(i),  respectively.   Substituting  these 
into  Eq .  |a.19|  produces; 


J^i)  -  1  Z^Z  -   h(i)^2(i) 
N 


I  A.20J 
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Substituting  JA.?]  and  |a.18|  into  |a.20|  and  simplifying 
yields ; 

J^(i)  =  1  Z^Z  -   [  il(i-l)   1   h(i/i-l)  ]  2(i) 
N 

'  1  1^1   '   il^i-l  )^£(i-1  )  -  hCi-l  )  F(i)k(i)  -  h(i/i-1  )  k(i) 
N 


J^(i-1)  -  [  h(i-l)  F(i)  +  h(i/i-l)   ]  iE(i) 
J2(i-1)  -   g(i)^k(i) 


|a.211 


where  we  made  use  of  1a.15|,  {A.16|,  and  J  (i-l),  the 
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previous  evaluation  of  the  (i-l)    model.   This  last 
expression  appears  in  Chapter  IV  as  Bq .  |4.36|. 
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APPENDIX  B.   RELATIONSHIP  OF  THE  GENERAL  RECURSIVE  ALGORITHM 

TO  LEVINSON'S  ALGORITHM 

When  we  are  given  a  set  of  simultaneous  linear 
equations , 

A(i)2(i)   =   h(i)  1b.i1 

where  A(i)  is  a  c(i)  by  c(i)  real  matrix,  the  direct 

3 
solution  of  Eq .  {B.i|  requires  on  the  order  of  [c(i)]  /3 

multiplicative  operations.   When  A(i)  is  symmetric  and 

positive  definite,  the  solution  can  be  accomplished  with  the 

3 
order  of  [c(i)]  /6  multiplicative  operations  by  various 

techniques  (e.g.   Cholesky,  LU  decomposition,  etc.)  . 

Appendix  A  developed  a  general  recursive  solution  for 

_£(i),  based  on  the  concept  that  the  previous  set  of 

c(i-l)  <  c(i)  equations  given  by; 

A(i-1  )2(i-1  )  =  h(i-1  )  U.9l 

-1 
has  been  previously  evaluated  for  _£(i)  and  A(i-1)   ,  where 

the  following  partitioned  matrx  relationship  exist. 

A(i/i-l) 


A(i) 


and 


3(i/i-l)^ 


h(i)^  =  [h(i- 


1) 


1a.6| 


h(i/i-l)J 


U.7| 


This  general  solution  equation  for  _£(i)  is  repeated  below; 


2(i) 


2(i-0 

0 


F(i) 

I 


k(i) 


A.  18} 


where  0  is  the  null  vector,   I  is  the  identity  matrix,  and; 
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F(i) 
G(i) 

£(i) 

k(i) 


-A(i-1)   B(i/i-l) 

T       -1 
A(i/i-l)  -  B(i/i-l)  A(i-1)   B(i/i-l) 

A(i/i-1  )  +  B(i/i-1  )^F(i) 

h(i/i-1  )  -  B(i/i-1  )^_£(i-''  ) 

h(i/i-1  )  +  F(i)^h(i-1  ) 

GCif^^Ci) 


[a. 12} 

!a.13| 

{a. 15! 
|A.161 


The  matrix  F(i)  given  by  Eq .  |a.12|  requires  the  use  of 
the  inverse  of  matrix  A(i-1).   If  this  was  not  explicitly 
solved  for  previously,  it  is  needed  at  this  point.   The 
vector  _lc(i)  given  by  Eq .  {A.IS}  requires  the  use  of  the 
inverse  of  matrix  G(i). 

In  1947,  Norman  Levinson  published  a  paper  in  which  he 
"in  order  to  facilitate  computational  procedure,  worked  out 
an  approximate,  and  one  might  say,  mathematically  trivial 
procedure"  [Ref.  1;  pp.  161  J.   Levinson  was  working  in 
conjunction  with  Norbert  Wiener  on  a  problem  involving 
linear  moving  average  filter  design  using  a  least  squares 
fit.   This  required  solving  a  set  of  simultaneous  linear 
equations  of  the  form  of  Eq .  JB.IJ.   Levinson  developed  an 
iterative  procedure  for  obtaining  ^(i)  based  on  the 
following  conditions  (given  in  terms  of  our  notation). 

(1)  Matrix  A(i)  is  symmetric,  and  can  be 
represented  in  the  form  of  Eq .  |a.6J.  [b.2] 

(2)  The  solution  for  _£(i-l)  has  been  previously 
obtained .  ! B . 3  I 

(3)  A(i/i-l)  is  a  1  z  1  matrix  (scalar),  a(i/i-l)   JB-AJ 
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Later,  Durbin  [Ref.  2]  simplified  the  solution  for_£(i)  ^7 
adding  a  fourth  condition. 

(4)   B(i/i-l)   reduces  to  a  vector  _b(i/i-l)   which 

equals  the  vector  2i(i-l)  in  reverse  order.       !b.51 
Despite  this  separate  and  later  simplification  by  Durbin, 
the  procedure  is  commonly  referred  to  as  Levinson's 
algorithm,  and  it  can  be  shown  that; 

^  f(i) 


2(i) 


I  0    J 


k(i) 


lB.6l 


The  main  property  of  this  algorithm  is  that  calculation  of 
the  vector  _f(i)  and  the  scalar  k(i)  do  not  require  any 
matrix  inversions.   Additionally,  the  number  of 
multiplicative  operations  required  for  the  solution  of  Eq . 
{b.6}  is  2c(i-l)+1,  where  c(i)-1  =  c(i-l),  and  c(i-l)  is  the 
size  of  matrix  A(i-l).   The  popularity  of  Levinson's 
algorithm  is  a  result  of  this  very  small  computational  cost, 
which  offers  a  significant  savings  over  the  direct  matrix 
solution  of  Eq .  JB.IJ  by  the  more  conventional  techniques. 

We  demonstrate  that  |b.6|  is  a  spec  ial  case  of  | A . 1 8  j 
based  on  the  four  conditions  1b.2|,  {b.3|,  1b.4|,  and  |b.51 
presented  above.   We  first  develop  an  expression  for  the 
factor  f(i)  in  Sq .  {3.6|.   ?rom  |b.5|  we  can  write 

B(i/i-l)    =  2i(i-l)  in  reverse  order   =   2i(i-l)         iB.Y} 
where  the  ""vJ'  denotes  an  end-for-end  reversal. 
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Substituting  |b.7|  into  1a.12|  produces; 


F(i) 


-1 


-1  ^ 
-A(i-1)    h(i-l) 


'   -A(i-1)   B(i/i-l) 

-  .A(i-ir^  '5(i-l)  {B.8l 
From  the  symmetric  condition  |b.2},  Eq.  |a.9|  can  be  written 

A(i-1)  ?(i-l)   =   7(i-1  )  {B.9l 

Substituting  |b.91  into  JB.Sl  and  simplifying  yields; 
F(i)   = 

=   -A(i-1  r"A(i-1  )2(i-1  ) 
=   -|(i-l)  |b.io| 

From  {b.10|,  we  see  that  F(i)  is  obtained  directly  from  the 
previous  solution  _£(i-l)  without  any  computational  cost. 

We  now  develop  an  equation  for  k(i)  in  Eq .  |b.6|. 
Substitute  |b.41,  {b.7|,  and  JB.IOl  into  |a.131  and  solve 

for  the  1x1  matrix  G(i). 

T 
G(i)   -   A(i/i-l)   +   B(i/i-l)  F(i) 

-  a(i/i-l)   -   ^(i-1)^  5(i-l)  iB.nl 
The  solution  of  Eq .  |b.11|  requires  c(i-l)  multiplications. 
Substituting  JB.IOJ  into  |a.15|  and  simplifying  yields; 

£(i)  =  h(i/i-1 )  +  F(i)  h(i-1 ) 

=  h(i/i-l)  -  ^(i-l )^h(i-1 )  !b.12| 

The  solution  of  Eq .  {b.12|  requires  c(i-l)  multiplications. 

Next  substitute  {b.IIJ  and  l3.12|  into  |a.16|. 

k(i)  =  G(i)  ^£(i) 

=  [  a(i/i-l)  -  h(i-l)^  £(i-l)  ]'^[  h(i/i-l)  -  ^(i-l)^^(i-l)  ] 
=  k(i),  a  scalar  {B.13f 
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Therefore  k(i)  requires  2c(i-l)+1  multiplications,  and  does 
not  involve  any  matrix  inversion,  other  than  the  trivial 
scalar  inversion  of  the  1  x  1  matrix  G(i). 

We  see  that  under  the  four  stated  conditions,  the 
general  recursive  solution  algorithm  JA.IOJ  reduces  to 
Levinson's  algorithm,  and  has  the  same  computational  cost. 

The  simplification  of  Levinson's  algorithm  is  critically- 
dependent  upon  conditions  |b.41  and  {b.5|.   This  first 
condition,  that  A(i/i-l)  is  a  1  x  1  matrix,  restricts 
Levinson's  algorithm  to  be  a  single-step  iterative  technique 
(one  increase  in  size  of  matrix  A(i)  over  A(i-1)).   This 
limits  model  growth  to  only  one  new  term  at  a  time  in  the 
general  modeling  problem.   The  second  critical  condition, 
that  the  transpose  of  B(i/i-l)  equals  _h(i-l)  in  reverse 
order,  restricts  Levinson's  algorithm  to  both  the  limited 
cases  of  model  growth  that  adds  terms  that  are  delayed 
versions  of  existing  terms,  and  the  use  of  the 
Autocorrelation  error  minimization  method  that  produces 
least  squares  normal  equations  with  this  special  structure. 
Note  that  matrix  A(j)  for  j=1,2,...,i  has  to  be  Toeplitz, 
and  the  vector  _h(j)  must  satisfy  condition  |b.51. 

Multiple  channel  versions  of  Levinson's  Algorithm  have 
been  proposed  [Hef.  3  -  9j,  but  these  all  require  the 
special  " recur sive- in- order"  relationship  between  the  model 
terms  represented  in  A(i-1),  and  those  terms  represented  in 
A(i).   This  is  an  unnecessary  and  suboptimal  restriction  for 
the  general  model  growth  problem. 
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APPENDIX  C.   DETAILS  OF  ORDER  OF  COMPLSXITY  CALCULATIONS 

USED  TO  COMPARE  THE  GROWTH  TECHNIQUES 

Using  the  same  convention  as  Chapter  VI,  we  denote  the 
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size  c(i-1 )  of  the  (i-1 )    model  as  ?,  and  the  number  of 
data  points  in  the  error  minimization  as  N.   ¥hen 
distinction  is  needed,  we  use  the  shorthand  "Step  D1",  "Step 
B1",  or  "Step  SI"  to  indicate  that  we  are  refering  to  Step  1 
of  the  direct  least  squares  technique,  block-form  recursive 
technique,  or  search  indicator  growth  technique, 
respectively.   When  the  computational  cost  is  the  same  for 
all  three  techniques  (e.g.  steps  1  through  7),  we  use  just 
the  step  number. 

T\e    computational  cost  for  each  of  the  first  seven  steps 
is  developed  as  follows.   Each  of  the  three  growth 
techniques  uses  the  identical  first  seven  steps. 
Step  1:   Set  i  =  1,  and  form  the  term  vector  _x(n,i).   No  cost 
Step  2:   Form  R(1)  using  Eq .  [4.5} 


R(i)  =  1  X(i)  X(i) 
N 


U.5| 


Since  X(i)  is  a  N  x  c(i)  matrix,  each  element  in  the 
c(i)  X  c(i)  matrix  R(i)  requires  N  multiplications 
and  1  division  operation.   Because  of  symmetry, 
there  are  c ( i ) [ c ( i )  +  1  ] /2  elements  in  matrix  R(i) 
that  must  be  calculated.  Therefore  the  computational 
cost  is  [n  +  1  ]c(i)[c(i)  +  1  ]/2. 
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step  3:   Form  £(i)  using  Eq .  {4.6} 

r(i)  =  J_  X(i)  2. 
N 


U.6} 


Each  of  the  c(i)  elements  in  vector  _r(i)  requires  N 
multiplications  and  1  division,  therefore  the 
computational  cost  is  c(i)[N+l]. 
Step  4:   Invert  H(i) 

Since  R(i)  is  a  symmetric  matrix  of  size  c(i),  it 
can  be  inverted  at  a  cost  of  [c(i)**3]/6  operations. 


Step  5:   Solve  for  J  (i)  using  Eq .  |4.3] 

J^i)  =   I  LZ    -    r(i)^R(i)"^  r(i) 
N 


l4.3l 


Since  _^  is  a  size  N  vector,  the  first  term  on  the 
right  side  of  {4.3|  can  be  computed  with  N+1 
operations.   Using  the  preceding  definitions  for  the 
sizes  of  vector  _r(i)  and  matrix  R(i),  the  second 
term  on  the  right  side  of  |4.3l  requires 
[c(i)**2]  +  c(i)  operations.   The  total  cost  for 
this  step  is  therefore  [c(i)**2]  +  c(i)  +  N  +  1. 

Step  6  and  Step  7  do  not  involve  any  computational  cost. 

Adding  costs  results  in  the  following  complexity  equation 

for  the  direct  least  squares  technique. 

0(n)  =  [c(i)»»3]/6  +  [c(i)**2][N+3]/2  +  c(i)[3N  +  5]/2  +  N  +  1   JCI 
We  continue  with  the  computational  cost  of  the  block- 

f o  rm  technique . 
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step  B8:   Form  A(i/i-l)  using  Eq.  {4.27 


A(i/i-l)  =  j_  W(i/i-1  )^W(i/i-1  ) 
N 


14.27} 


The  N  X  q(i)  matrix  W(i/i-l)  is  the  data  matrix  for 
the  new  model  terms.   Each  element  in  matrix 
A(i/i-l)  requires  N  multiplications  and  1  division 
operation.   Because  of  symmetry,  there  are 
q( i) [q( i)+1 ]/2  elements  that  must  be  calculated, 
therefore  the  computational  cost  is 
[N*l]q(i)[q(i)+l]/2. 
Step  B9:   Form  B(i/i-l)  using  Eq .  \a.26] 


B(i/i-1  )  =  j_  W(i-1  )  W(i/i-1  ) 
N 


U.261 


The  N  X  P  matrix  W(i-1)  is  the  data  matrix  for  the 

s  c 
(i-l)    model  obtained  previously.   Each  element  in 

matrix  B(i/i-l)  requires  N  multiplications  and  1 

division  operation.   Since  there  are  q(i)P  elements 

that  must  be  calculated,  the  computational  cost  is 

[n  +  1  ]q(i)P. 

Step  B10:   Form  h(i/i-l)  using  Eq .  U-291 


h(i/i-1 )  =  1  W(i/i-1 )  2 
N 


U.291 


Using  the  preceding  definitions  of  the  sizes  of 
matrix  V(i/i-l)  and  vector^,  each  of  the  q(i) 
elements  in  vector  h.(i/i-l)  requires  N 

multiplications  and  1  division  operation.   Therefore 
the  computational  cost  is  q(i)lN+l]. 
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step  B11:   Form  F(i)  using  Eq .  U'30| 

P(i)  =  -A(i-1 )   B(i/i-1 )  {4.30I 

The  P  X  P  matrix  A(i/i-l)    is  the  inverse  of  the 
least  squares  matrix  for  the  (i-1)    model  obtained 
previously.   Since  matrix  B(i/i-l)  is  P  x  q(i),  each 
element  in  matrix  F(i)  requires  P  multiplications. 
There  are  q(i)P  elements  in  matrix  F(i)  and 
therefore  the  computational  cost  is  q(i)[P**2]. 

Step  B12:   Form  a(i)  using  Eq .  U.31I 

G(i)  =  A(i/i-l)  +  B(i/i-l)  F(i)  U.31I 

Using  the  preceding  definitions  of  the  sizes  for 
matrices  B(i/i-l)  and  F(i),  each  of  the  [q(i)**2] 
elements  in  the  result  of  the  second  term  on  the 
right  side  of  J4.3l|  requires  P  multiplications. 
Since  there  are  [q(i)**2]  elements  in  this  resulting 
matrix,  the  total  computational  cost  is  P[q(i)**2j. 

Step  B13:   Form  £_(  i)    using  Eq .  14.32} 

^(i)  =  h(i/i-l)  +  F(i)  h(i-l)  I4.32I 

Using  the  preceding  definitions  of  the  sizes  for 
matrix  F(i)  and  vector  jiCi/i-l),  each  of  the  q(i) 
elements  in  the  result  of  the  second  terra  on  the 
right  side  of  {4.32}  requires  P  multiplications. 
Therefore,  the  total  computational  cost  is  Pq(i). 

Step  314:   Invert  G(i) 

Since  G(i)  is  a  symmetric  matrix  of  size  q(i),  it 
can  be  inverted  at  a  cost  of  [q(i)**3]/6  operations. 
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Step  B15:   Form  k(i)  using  Eq .  {4.33} 

k(i)  -  GdT^^d)  [4.331 

Since  the  inverse  matrix  GCiT    is  size  q(i)  x  q(i), 
and  _£(i)  is  a  q(i)  size  vector,  each  of  the  q(i) 
elements  of  vector  jc(i)  requires  q(i) 
multiplications.   Therefore,  the  computational  cost 
is  [q(i)**2]. 
Step  B16:   Solve  for  J^(i)  using  Eq .  [4.36] 

J^(i)   -   J^(i-1)   -   £(i)\(i)  [4.36] 

Since  _^(i)  and  Jc(i)  are  both  size  q(i)  column 
vectors,  the  computational  cost  is  q(i). 
Step  B17  does  not  involve  any  computational  cost. 

Based  on  the  results  of  step  B17,  the  growth  may  stop. 
Adding  complexity  notational  for  each  step,  results  in  the 
following  complexity  equation  for  the  block-form  technique 
(steps  B8  through  B17). 
0(n)  =  [q(i)»*3]/6  >  [P*N+3][q(i)**2]/2 

+  q(i) [nP+[p**2]+2P+[3N+5]/2]  [c.2| 

If  additional  growth  iterations  are  required  for 
adequate  modeling  performance,  one  additional  computational 
step  is  required  before  starting  again  at  step  B7. 
Step  318:   Form  inverse  of  A(i)  using  Eq .  [4.37} 


A(i) 


-1 


"1      T  '  •.! 

+  F(i)G(i)   F(i)  !  F(i)G(ir 


A(i-1) 

^ ^ , 

G(i)   F(i)  I      G(i) 

All  of  the  indicated  matrices  in  [4.37]  have  already 
been  calculated.   The  only  computations  required  are 


4.37} 
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those  necessary  to  form  the  matrix  factors 

-1     T  -1 

F(i)G(i)   F(i)   and  F(i)G(i)   .   Using  the 

previously  defined  sizes  of  these  matrices,  these 
factors  can  both  be  calculated  with  a  total  cost  of 
q(i)[P*»2]  +  P[q(i)**2]. 
We  continue  with  the  computational  costs  of  the  search 
indicator  growth  technique.   The  cost  of  step  1  through  step 
7  is  the  same  as  first  two  growth  techniques. 
Step  S8:   Form  l(j,12)  for  terms  in  w(n»i/i-0  using 

the  definition  of  Sq .  (6.291,  repeated  here; 


I(j.12)    = 


-2 


1     w    (i/i-1 )     e(i-1 ) 


1   WiCi/i-l  )    w. (i/i-1  ) 


16.291 

Using  the  computational  results  of  Table  5  and  Table 
6,  the  cost  of  I(j,12)  for  the  first  term  is  NP+2N+4, 

and  the  cost  for  each  of  the  second  through  the 

ch 
q(i)    term  is  2N+4.  Therefore,  the  total  cost  for 

the  q(i)  indicators  l(j,12)  is  =   FN  +  [2N+4]q(i). 

Step  39  involves  selecting  the  subset  of  terms  with 

values  of  l(j,12)  greater  than  a  specified  level  h. . 

Depending  on  the  value  of  h   and  the  values  of  l(j,12), 

there  will  be  an  integer  k  number  of  terms  left.   The  size 

of  the  term  vector  w(n, i/i-1 )  is  reduced  from  q(i)  x  1  to 

k  X  1.   There  is  no  significant  cost  of  this  step. 
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Step  S10:   Form  A(i/i-l)  using  the  reduced  vector 

w(n,i/i-l)  in  Eq .  {4.23}  and  substituting 
this  into  Eq .  |4.27l  to  form; 


A(i/i-l)  =  I   W(i/i-l)  W(i/i-l) 
N 


U.27I 


The  N  X  k  matrix  W(i/i-l)  is  now  the  reduced  data 
matrix  for  the  new  model  terras.   Each  element  in 
matrix  A(i/i-l)  requires  N  multiplications  and  1 
division  operation.   Because  of  symmetry,  there  are 
k[k+1J/2  elements  that  must  be  calculated,  therefore 
the  computational  cost  is  [n+1 ]k[ k+1 ] /2 . 


Step  S11:   Form  B(i/i-l)  using  Eq .  {4.26} 

B(i/i-l)  =  j_  W(i-1  )■"■  W(i/i-1  ) 
N 


{4.26} 


The  N  X  P  matrix  W(i-1)  is  the  data  matrix  for  the 
(i-1;    model  obtained  previously.   Each  element  in 
jnatrix  B(i/i-l)  requires  N  multiplications  and  1 
division  operation.   Since  there  are  kP  elements 
that  must  be  calculated,  the  computational  cost  is 
[n+1 ]kP. 


Step  S12:   Form  h(i/i-l)  using  Eq .  l4.29l 

h(i/i-1  )  =  j_  W(i/i-1  )^2 
N 


l4.29l 


Using  the  preceding  definitions  of  the  sizes  of 
matrix  W(i/i-l)  and  vector  ^,  each  of  the  k  elements 
in  vector  _h(i/i-l)  requires  N  multiplications  and  1 
division  operation.   Total  cost  is  k[N+l]. 
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step  S13:   Form  F(i)  using  Eq .  |4.30] 
F(i)  =  -A(i-1  r^B(i/i-l) 


{4.30} 


-1 


The  P  X  P  inverse  matrix  A(i/i-l)    is  the  least 
squares  matrix  for  the  (i-1)    model  obtained 
previously.   Since  matrix  B(i/i-l)  is  P  x  k,  each 
element  in  matrix  F(i)  requires  P  multiplications. 
There  are  kP  elements  in  matrix  F(i)  and  therefore 
the  computational  cost  is  k[P**2]. 

Step  S14:   Form  a(i)  using  Eq .  {4.31| 

G(i)  =  A(i/i-l)  +  B(i/i-1 )^F(i)  14.31} 

Using  the  preceding  definitions  for  the  sizes  of 
matrices  B(i/i-l)  and  F(i),  each  element  in  the 
result  of  the  second  term  on  the  right  side  of 
{4. 31  I  requires  P  multiplications.   Since  there  are 
[k**2J  elements  in  this  resulting  matrix,  the  total 
computational  cost  is  P[k**2j. 

Step  S15:   Form  _g(i)  using  Eq .  [4. 32} 

_£(i)  =  h(i/i-l)  +  F(i)^h(i-1)  I4.32I 

Using  the  preceding  definitions  of  the  sizes  for 
matrix  7(i)  and  vector  _h(i/i-l),  each  of  the  k 
elements  in  the  result  of  the  second  term  on  the 
right  side  of  |4.32|  requires  P  multiplications. 
Therefore,  the  total  computational  cost  is  Pk . 

Step  316:   Invert  G(i) 

Since  G(i)  is  a  symmetric  matrix  of  size  k,  it  can 
be  inverted  at  a  cost  of  [k**3]/6  operations. 


255 


J^(i) 


J^(i-I)   -  ^(i)    _k(i) 


Step  S17:   Form  vector  j<(i)  using  Eq.  {4.33} 

k(i)  =  G(i)~  g_(i)  {4.33} 

Since  the  inverse  matrix  G(i)    is  size  k  x  k,  and 
_5_(  i  )  is  a  k  size  vector,  each  of  the  k  elements  of 
vector  _k  ( i )  requires  k  multiplications.   Therefore, 
the  computational  cost  is  [k**2]. 
Step  S18:   Solve  for  J^(i)  using  Eq .  {4.36} 

{4.36} 

Since  g(i)  and  k(i)  are  both  size  k  column  vectors, 
the  computational  cost  is  k  multiplications. 
Step  S19  does  not  involve  any  computational  cost. 

Based  on  the  results  of  step  S19,  the  growth  may  stop. 
Adding  complexity  notation  for  each  step,  results  in  the 
following  complexity  equation  for  the  block-form  technique 

(steps  S8  through  S19). 

* 

0(n)  =  [k»»3]/6  +  [P  +  N+3]  [k»»2]/2  +  k [ NP+ [ P* *2 ] +2P+ [ 3N  +  5 ] /2 ] 
+  MP  +  [2N+4]q(i)  {C. 3} 

If  additional  growth  iterations  are  required  for 
adequate  modeling  performance,  one  additional  computational 
step  is  required  before  starting  again  at  step  S7. 
Step  B18:   Form  inverse  of  A(i)  using  Eq.  {4.37} 


A(i) 


-1 


-1  -1     T  I  -1 


GCi)"^  F(i) 


G(i) 


-1 


{4.37} 


All  of  the  indicated  matrices  in  {4.37}  have  already 
been  calculated.  The  only  computations  required  are 
those  necessary  to  form  the  matrix  factors 
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-1     T  -1 

F(i)G(i)   F(i)   and  F(i)G(i)   .   Using  the 

previously  defined  sizes  of  these  matrices,  these 
factors  can  both  be  calculated  with  a  total  cost  of 
k[P»*2]  +  P[k»»2]. 
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